7 Functional programming
7.1 Annonymous functions
Q: Given a function, like
"mean"
,match.fun()
lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?
A: If you knowbody()
,formals()
andenvironment()
it can be possible to find the function. However, this won’t be possible for primitive functions, since they returnNULL
for those three properties. Also annonymous functions won’t be found, because they are not bound to a name. On the other hand it could be that different names in an environment contain binding to one (or more functions) with the samebody()
,formals()
andenvironment()
which means that the solution wouldn’t be unique. More general: In R a (function) name has an object, but an object (i.e. a function) doesn’t have a name (just a binding sometimes).Q: Use
lapply()
and an anonymous function to find the coefficient of variation (the standard deviation divided by the mean) for all columns in themtcars
datasetA:
lapply(mtcars, function(x) sd(x)/mean(x))
.Q: Use
integrate()
and an anonymous function to find the area under the curve for the following functions. Use Wolfram Alpha to check your answers.y = x ^ 2 - x
, x in [0, 10]y = sin(x) + cos(x)
, x in [-\(\pi\), \(\pi\)]y = exp(x) / x
, x in [10, 20]
A:
integrate(function(x) x^2 - x, 0, 10) integrate(function(x) sin(x) + cos(x), -pi, pi) integrate(function(x) exp(x) / x, 10, 20)
Q: A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use
{}
. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?
A:
7.2 Closures
Q: Why are functions created by other functions called closures?
A: As stated in the book:because they enclose the environment of the parent function and can access all its variables.
Q: What does the following statistical function do? What would be a better name for it? (The existing name is a bit of a hint.)
bc <- function(lambda) { if (lambda == 0) { function(x) log(x) } else { function(x) (x ^ lambda - 1) / lambda } }
A: It is the logarithm, when lambda equals zero and
x ^ lambda - 1 / lambda
otherwise. A better name might bebox_cox_transformation
(one parametric), you can read about it (here)[https://en.wikipedia.org/wiki/Power_transform].Q: What does
approxfun()
do? What does it return?
A:approxfun
basically takes a combination of 2-dimensional data points + some extra specifications as arguments and returns a stepwise linear or constant interpolation function (defined on the range of given x-values, by default).Q: What does
ecdf()
do? What does it return?
A: “ecdf” means empirical density function. For a numeric vector,ecdf()
returns the appropriate density function (of class “ecdf”, which is inheriting from class “stepfun”). You can describe it’s behaviour in 2 steps. In the first part of it’s body, the(x,y)
pairs for the nodes of the density function are calculated. In the second part these pairs are given toapproxfun
.Q: Create a function that creates functions that compute the ith central moment of a numeric vector. You can test it by running the following code:
m1 <- moment(1) m2 <- moment(2) x <- runif(100) stopifnot(all.equal(m1(x), 0)) stopifnot(all.equal(m2(x), var(x) * 99 / 100))
A: For a discrete formulation look here
moment <- function(i){ function(x) sum((x - mean(x)) ^ i) / length(x) }
Q: Create a function
pick()
that takes an index,i
, as an argument and returns a function with an argumentx
that subsetsx
withi
.lapply(mtcars, pick(5)) # should do the same as this lapply(mtcars, function(x) x[[5]])
A:
pick <- function(i){ function(x) x[[i]] } stopifnot(identical(lapply(mtcars, pick(5)), lapply(mtcars, function(x) x[[5]])) )
7.3 Lists of functions
Q: Implement a summary function that works like
base::summary()
, but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.A: We have two possibilities, we can imitate
base::summary()
completely or create a new summary based on our preferences. Both is not so easy, since it involves a lot of design decisions. We choose the second option, since we just like to create a first draft to apply what we have learned and get some feeling for the challenges that might appear.Some properties, that our new summary function
summary2
should have are nice default actions for specific data types and they should of course be changeable as this is also a part of the exercise. To limit our efforts, we focus on summaries for data frames. Everything else will be explained, via comments on the code:# The arguments of our function factory are the lists of functions that are # applied to data frame columns, depending on their type. # We focus on the most important, so they can be set for characters, integer, # double, logical, factor and date. By default they are set to NULL, but if you # supply a list with functions, this will override the real default, for the # specific type, which is set inside the function factory. summary2 <- function(character_functions = NULL, integer_functions = NULL, double_functions = NULL, logical_functions = NULL, factor_functions = NULL, date_functions = NULL){ # The following functional will later be six times applied on the data frame, # one time for every column type in the scope of our function apply_typefunction <- function(df, pred, functions){ lapply(df[vapply(df, pred, logical(1))], function(x) unlist(lapply(functions, function(y) y(x)))) } # The following lists of functions are "somehow" similar to those, that are used # by base::summary, so we define them once... default_1 <- list(Table = table) default_2 <- list(Min = min, `1st Qu.` = function(x) quantile(x)[[2]], Median = median, Mean = mean, `3rd Qu.` = function(x) quantile(x)[[4]], Max = max) # All those function list, that are not specified, when calling the # function factory, are now set to their default values if(is.null(character_functions)) {character_functions = default_1} if(is.null(integer_functions)) {integer_functions = default_2} if(is.null(double_functions)) {double_functions = default_2} if(is.null(logical_functions)) {logical_functions = default_1} if(is.null(factor_functions)) {factor_functions = default_1} if(is.null(date_functions)) {date_functions = default_2} # Finally the returned function is created function(df){ # For every column type, the specific functions will be applied to the # appropriate columns. characters <- apply_typefunction(df, is.character, character_functions) integers <- apply_typefunction(df, is.integer , integer_functions ) doubles <- apply_typefunction(df, is.double , double_functions ) logicals <- apply_typefunction(df, is.logical , logical_functions ) factors <- apply_typefunction(df, is.factor , factor_functions ) dates <- apply_typefunction(df, function(x) inherits(x, 'Date'), date_functions) # The results will be collected in a list and if empty lists appear, because # of non occuring columntypes, these empty lists will be removed from the output. # There are a lot of formatting steps, like ordering, naming and converting # output, that we could do, but we think that the idea is more important for now out <- list(characters, integers, doubles, logicals, factors, dates) out[lengths(out) != 0] } } # Now we can apply the function factory summary2_default <- summary2() # And the resulting function summary2_default(df = iris) #> [[1]] #> [[1]]$Sepal.Length #> Min 1st Qu. Median Mean 3rd Qu. Max #> 4.300000 5.100000 5.800000 5.843333 6.400000 7.900000 #> #> [[1]]$Sepal.Width #> Min 1st Qu. Median Mean 3rd Qu. Max #> 2.000000 2.800000 3.000000 3.057333 3.300000 4.400000 #> #> [[1]]$Petal.Length #> Min 1st Qu. Median Mean 3rd Qu. Max #> 1.000 1.600 4.350 3.758 5.100 6.900 #> #> [[1]]$Petal.Width #> Min 1st Qu. Median Mean 3rd Qu. Max #> 0.100000 0.300000 1.300000 1.199333 1.800000 2.500000 #> #> #> [[2]] #> [[2]]$Species #> Table.setosa Table.versicolor Table.virginica #> 50 50 50 # Unfortunately, we will fail if there are any NAs in integer columns df_nas <- data.frame(integers_na = c(NA, 2:19)) summary2_default(df_nas) #> Error in quantile.default(x): missing values and NaN's not allowed if 'na.rm' is FALSE # But since, we can define new functions for integer columns, we can solve this summary2_naversion <- summary2(integer_functions = list( Mean_na = function(x) mean(x, na.rm = TRUE), Median_na = function(x) median(x, na.rm = TRUE), NAs = function(x) sum(is.na(x))) ) summary2_naversion(df_nas) #> [[1]] #> [[1]]$integers_na #> Mean_na Median_na NAs #> 10.5 10.5 1.0
Q: Which of the following commands is equivalent to
with(x, f(z))
?x$f(x$z)
.f(x$z)
.x$f(z)
.f(z)
.- It depends.
A: b is equivalent. If
x
is the current environment, also d would work.
7.4 Case study: numerical integration
Q: Instead of creating individual functions (e.g.,
midpoint()
,trapezoid()
,simpson()
, etc.), we could store them in a list. If we did that, how would that change the code? Can you create the list of functions from a list of coefficients for the Newton-Cotes formulae?
A:Q: The trade-off between integration rules is that more complex rules are slower to compute, but need fewer pieces. For
sin()
in the range [0, \(\pi\)], determine the number of pieces needed so that each rule will be equally accurate. Illustrate your results with a graph. How do they change for different functions?sin(1 / x^2)
is particularly challenging.
A: