# 7 Functional programming

## 7.1 Annonymous functions

**Q**: Given a function, like`"mean"`

,`match.fun()`

lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

**A**: If you know`body()`

,`formals()`

and`environment()`

it can be possible to find the function. However, this won’t be possible for primitive functions, since they return`NULL`

for those three properties. Also annonymous functions won’t be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the same`body()`

,`formals()`

and`environment()`

which means that the solution wouldn’t be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn’t have a name (just a binding sometimes).**Q**: Use`lapply()`

and an anonymous function to find the coefficient of variation (the standard deviation divided by the mean) for all columns in the`mtcars`

dataset**A**:`lapply(mtcars, function(x) sd(x)/mean(x))`

.**Q**: Use`integrate()`

and an anonymous function to find the area under the curve for the following functions. Use Wolfram Alpha to check your answers.`y = x ^ 2 - x`

, x in [0, 10]`y = sin(x) + cos(x)`

, x in [-\(\pi\), \(\pi\)]`y = exp(x) / x`

, x in [10, 20]

**A**:`integrate(function(x) x^2 - x, 0, 10) integrate(function(x) sin(x) + cos(x), -pi, pi) integrate(function(x) exp(x) / x, 10, 20)`

**Q**: A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use`{}`

. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?

**A**:

## 7.2 Closures

**Q**: Why are functions created by other functions called closures?

**A**: As stated in the book:because they enclose the environment of the parent function and can access all its variables.

**Q**: What does the following statistical function do? What would be a better name for it? (The existing name is a bit of a hint.)`bc <- function(lambda) { if (lambda == 0) { function(x) log(x) } else { function(x) (x ^ lambda - 1) / lambda } }`

**A**: It is the logarithm, when lambda equals zero and`x ^ lambda - 1 / lambda`

otherwise. A better name might be`box_cox_transformation`

(one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].**Q**: What does`approxfun()`

do? What does it return?

**A**:`approxfun`

basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).**Q**: What does`ecdf()`

do? What does it return?

**A**: “ecdf” means empirical density function. For a numeric vector,`ecdf()`

returns the appropriate density function (of class “ecdf”, which is inheriting from class “stepfun”). You can describe it’s behaviour in 2 steps. In the first part of it’s body, the`(x,y)`

pairs for the nodes of the density function are calculated. In the second part these pairs are given to`approxfun`

.**Q**: Create a function that creates functions that compute the ith central moment of a numeric vector. You can test it by running the following code:`m1 <- moment(1) m2 <- moment(2) x <- runif(100) stopifnot(all.equal(m1(x), 0)) stopifnot(all.equal(m2(x), var(x) * 99 / 100))`

**A**: For a discrete formulation look here`moment <- function(i){ function(x) sum((x - mean(x)) ^ i) / length(x) }`

**Q**: Create a function`pick()`

that takes an index,`i`

, as an argument and returns a function with an argument`x`

that subsets`x`

with`i`

.`lapply(mtcars, pick(5)) # should do the same as this lapply(mtcars, function(x) x[[5]])`

**A**:`pick <- function(i){ function(x) x[[i]] } stopifnot(identical(lapply(mtcars, pick(5)), lapply(mtcars, function(x) x[[5]])) )`

## 7.3 Lists of functions

**Q**: Implement a summary function that works like`base::summary()`

, but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.**A**: We have two possibilities, we can imitate`base::summary()`

completely or create a new summary based on our preferences. Both is not so easy, since it involves a lot of design decisions. We choose the second option, since we just like to create a first draft to apply what we have learned and get some feeling for the challenges that might appear.Some properties, that our new summary function

`summary2`

should have are nice default actions for specific data types and they should of course be changeable as this is also a part of the exercise. To limit our efforts, we focus on summaries for data frames. Everything else will be explained, via comments on the code:`# The arguments of our function factory are the lists of functions that are # applied to data frame columns, depending on their type. # We focus on the most important, so they can be set for characters, integer, # double, logical, factor and date. By default they are set to NULL, but if you # supply a list with functions, this will override the real default, for the # specific type, which is set inside the function factory. summary2 <- function(character_functions = NULL, integer_functions = NULL, double_functions = NULL, logical_functions = NULL, factor_functions = NULL, date_functions = NULL){ # The following functional will later be six times applied on the data frame, # one time for every column type in the scope of our function apply_typefunction <- function(df, pred, functions){ lapply(df[vapply(df, pred, logical(1))], function(x) unlist(lapply(functions, function(y) y(x)))) } # The following lists of functions are "somehow" similar to those, that are used # by base::summary, so we define them once... default_1 <- list(Table = table) default_2 <- list(Min = min, `1st Qu.` = function(x) quantile(x)[[2]], Median = median, Mean = mean, `3rd Qu.` = function(x) quantile(x)[[4]], Max = max) # All those function list, that are not specified, when calling the # function factory, are now set to their default values if(is.null(character_functions)) {character_functions = default_1} if(is.null(integer_functions)) {integer_functions = default_2} if(is.null(double_functions)) {double_functions = default_2} if(is.null(logical_functions)) {logical_functions = default_1} if(is.null(factor_functions)) {factor_functions = default_1} if(is.null(date_functions)) {date_functions = default_2} # Finally the returned function is created function(df){ # For every column type, the specific functions will be applied to the # appropriate columns. characters <- apply_typefunction(df, is.character, character_functions) integers <- apply_typefunction(df, is.integer , integer_functions ) doubles <- apply_typefunction(df, is.double , double_functions ) logicals <- apply_typefunction(df, is.logical , logical_functions ) factors <- apply_typefunction(df, is.factor , factor_functions ) dates <- apply_typefunction(df, function(x) inherits(x, 'Date'), date_functions) # The results will be collected in a list and if empty lists appear, because # of non occuring columntypes, these empty lists will be removed from the output. # There are a lot of formatting steps, like ordering, naming and converting # output, that we could do, but we think that the idea is more important for now out <- list(characters, integers, doubles, logicals, factors, dates) out[lengths(out) != 0] } } # Now we can apply the function factory summary2_default <- summary2() # And the resulting function summary2_default(df = iris) #> [[1]] #> [[1]]$Sepal.Length #> Min 1st Qu. Median Mean 3rd Qu. Max #> 4.300000 5.100000 5.800000 5.843333 6.400000 7.900000 #> #> [[1]]$Sepal.Width #> Min 1st Qu. Median Mean 3rd Qu. Max #> 2.000000 2.800000 3.000000 3.057333 3.300000 4.400000 #> #> [[1]]$Petal.Length #> Min 1st Qu. Median Mean 3rd Qu. Max #> 1.000 1.600 4.350 3.758 5.100 6.900 #> #> [[1]]$Petal.Width #> Min 1st Qu. Median Mean 3rd Qu. Max #> 0.100000 0.300000 1.300000 1.199333 1.800000 2.500000 #> #> #> [[2]] #> [[2]]$Species #> Table.setosa Table.versicolor Table.virginica #> 50 50 50 # Unfortunately, we will fail if there are any NAs in integer columns df_nas <- data.frame(integers_na = c(NA, 2:19)) summary2_default(df_nas) #> Error in quantile.default(x): missing values and NaN's not allowed if 'na.rm' is FALSE # But since, we can define new functions for integer columns, we can solve this summary2_naversion <- summary2(integer_functions = list( Mean_na = function(x) mean(x, na.rm = TRUE), Median_na = function(x) median(x, na.rm = TRUE), NAs = function(x) sum(is.na(x))) ) summary2_naversion(df_nas) #> [[1]] #> [[1]]$integers_na #> Mean_na Median_na NAs #> 10.5 10.5 1.0`

**Q**: Which of the following commands is equivalent to`with(x, f(z))`

?`x$f(x$z)`

.`f(x$z)`

.`x$f(z)`

.`f(z)`

.- It depends.

**A**: b is equivalent. If`x`

is the current environment, also d would work.

## 7.4 Case study: numerical integration

**Q**: Instead of creating individual functions (e.g.,`midpoint()`

,`trapezoid()`

,`simpson()`

, etc.), we could store them in a list. If we did that, how would that change the code? Can you create the list of functions from a list of coefficients for the Newton-Cotes formulae?

**A**:**Q**: The trade-off between integration rules is that more complex rules are slower to compute, but need fewer pieces. For`sin()`

in the range [0, \(\pi\)], determine the number of pieces needed so that each rule will be equally accurate. Illustrate your results with a graph. How do they change for different functions?`sin(1 / x^2)`

is particularly challenging.

**A**: