9 Function operators
9.1 Behavioural FOs
Q: Write a FO that logs a time stamp and message to a file every time a function is run.
A: Note that the example will create a file file in your current working directory:
logger <- function(f, filename){ force(f) filename_tmp <- paste(filename, basename(tempfile()), sep = "_") write(paste("created at:", Sys.time()), filename_tmp, append = TRUE) function(..., message = "you can add a message at each call") { write(paste0("used at: ", Sys.time(), ", ", message), filename_tmp, append = TRUE) f(...) } } # the following line creates a file, which name starts with "mean_log_" mean2 <- logger(mean, "mean_log") mean2(1:4, message = "first time") mean2(1:4, message = "second_time")
Q: What does the following function do? What would be a good name for it?
f <- function(g) { force(g) result <- NULL function(...) { if (is.null(result)) { result <<- g(...) } result } } runif2 <- f(runif) runif2(5) #> [1] 0.1648604 0.7923487 0.0473985 0.5719584 0.5383457 runif2(10) #> [1] 0.1648604 0.7923487 0.0473985 0.5719584 0.5383457
A: It returns a new version of the inputfunction. That version will always return the result of it’s first run (in case this not
NULL
), no matter how the input changes. Good names could befirst_run()
orinitial_return()
.Q: Modify
delay_by()
so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you calledg <- delay_by(1, f); g(); Sys.sleep(2); g()
there shouldn’t be an extra delay.A: We can do this with three little tricks (and the help of 42):
delay_by_v2 <- function(delay, f) { force(f) # we initialise the timestamp for the last run. We set a specific default value, # to ensure that the first run of the returned function will never be delayed last_runtime <- Sys.time() - (delay + 42) function(...) { # we continually check if enough time passed with an (empty) while statement. while (Sys.time() < last_runtime + delay) {} # we override the start for the next waiting interval. # Note that this is done on exit (after the function is evaluated) on.exit(last_runtime <<- Sys.time()) return(f(...)) } }
Alternatively to the empty while statement we could have used
Sys.sleep()
. I would not recommend this solution, since?Sys.sleep
indicates thatSys.sleep()
might have some overhead and seems not to be as exact as we need.Q: Write
wait_until()
which delays execution until a specific time.A:
wait_until <- function(time, f) { force(f) function(...) { while (Sys.time() < time) {} return(f(...)) } } # a little test ptm <- proc.time() m <- wait_until(Sys.time() + 10, mean) m(1:3) proc.time() - ptm
Q: There are three places we could have added a memoise call: why did we choose the one we did?
download <- memoise(dot_every(10, delay_by(1, download_file))) download <- dot_every(10, memoise(delay_by(1, download_file))) download <- dot_every(10, delay_by(1, memoise(download_file)))
A: The second was chosen. It’s easy to see why, if we eliminate the other two options:
The first version only prints a dot at every tenth
download()
call with a new input. This is becausedot_every()
is inside ofmemoise()
and the counter created bydot_every()
is not “activated” if the input is known.The third version takes one second for every call. Even if we already know the result and don’t download anything again.
Q: Why is the
remember()
function inefficient? How could you implement it in more efficient way?Q: Why does the following code, from stackoverflow, not do what you expect?
# return a linear function with slope a and intercept b. f <- function(a, b) function(x) a * x + b # create a list of functions with different parameters. fs <- Map(f, a = c(0, 1), b = c(0, 1)) fs[[1]](3) #> [1] 0 # should return 0 * 3 + 0 = 0
How can you modify
f
so that it works correctly?A: You can read in the stackoverflow link that the question arose, because the original return of
fs[[1]](3)
was4
, which is due to lazy evaluation and could be solved by two users viaforce()
:f <- function(a, b) {force(a); force(b); function(x) a * x + b}
However you can see in the result within the question that R’s behaviour was changed in this case and as Jan Kislinger points out on twitter:
The real question should be: “How did they modify #rstats so that it works correctly?” otherwise it’s a tricky question :D
Note that the same issue appears in the textbook:
In the following example, we take a list of functions and delay each one. But when we try to evaluate the mean, we get the sum instead.
funs <- list(mean = mean, sum = sum) funs_m <- lapply(funs, delay_by, delay = 0.1) funs_m$mean(1:10) #> [1] 5.5
Which (as one can see) is not true anymore…actually it changed in R version 3.2:
Higher order functions such as the apply functions and Reduce() now force arguments to the functions they apply in order to eliminate undesirable interactions between lazy evaluation and variable capture in closures. This resolves PR#16093.
For further interested: PR#16093 will lead you to the subject “iterated lapply” within the R-devel Archives. Note that the behaviour in for loops is still as “the old
lapply()
” behaviour.
9.2 Output FOs
Q: Create a
negative()
FO that flips the sign of the output of the function to which it is applied.A:
negative <- function(f){ force(f) function(...){ -f(...) } }
Q: The
evaluate
package makes it easy to capture all the outputs (results, text, messages, warnings, errors, and plots) from an expression. Create a function likecapture_it()
that also captures the warnings and errors generated by a function.A: One way is just to capture the output of
tryCatch()
with identity handlers for errors and warnings:capture_trials <- function(f){ force(f) function(...){ capture.output(tryCatch(f(...), error = function(e) e, warning = function(w) w) ) } } # we test the behaviour log_t <- capture_trials(log) elements <- list(1:10, c(-1, 10), c(TRUE, FALSE), letters) results <- lapply(elements, function(x) log_t(x)) results #> [[1]] #> [1] " [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101" #> [2] " [8] 2.0794415 2.1972246 2.3025851" #> #> [[2]] #> [1] "<simpleWarning in f(...): NaNs produced>" #> #> [[3]] #> [1] "[1] 0 -Inf" #> #> [[4]] #> [1] "<simpleError in f(...): non-numeric argument to mathematical function>" # further # results_detailed <- lapply(elements, function(x) lapply(x, function(y))log2(x)) # results_detailed
Q: Create a FO that tracks files created or deleted in the working directory (Hint: use
dir()
andsetdiff()
.) What other global effects of functions might you want to track?A: We start with a short version to show the idea:
track_dir <- function(f){ force(f) function(...){ dir_old <- dir() on.exit(if(!setequal(dir(), dir_old)){ message("files in your working directory were deleted or added by this function")}) f(...) } } # the following test will create the file "delete_me" in your working directory td <- track_dir(dir.create) td("delete_me")
Of course we can provide more information on the type of changes:
track_dir <- function(f){ force(f) function(...){ dir_old <- dir() on.exit(if(!setequal(dir(), dir_old)){ message("Files in your working directory were deleted or added by this function.")}, add = TRUE) on.exit(if(length(setdiff(dir_old, dir()) != 0)){ message(paste0("The following files were deleted: ", paste(setdiff(dir_old, dir()), collapse = ", ") ))}, add = TRUE) on.exit(if(length(setdiff(dir(), dir_old) != 0)){ message(paste0("The following files were added: ", paste(setdiff(dir(), dir_old), collapse = ", ") ))}, add = TRUE) f(...) } } # the following test will again create two files in your working directory td <- track_dir(sapply) td(c("delete_me", "me_too"), dir.create)
Other global effects that might be worth tracking include changes regarding:
- the search path and/or introduced
conflicts()
options()
andpar()
which modify global settings- the path of the working directory
- environment variables
- the locale.
- the search path and/or introduced
9.3 Input FOs
Q: Our previous
download()
function only downloads a single file. How can you usepartial()
andlapply()
to create a function that downloads multiple files at once? What are the pros and cons of usingpartial()
vs. writing a function by hand?Q: Read the source code for
plyr::colwise()
. How does the code work? What arecolwise()
’s three main tasks? How could you makecolwise()
simpler by implementing each task as a function operator? (Hint: think aboutpartial()
.)A: We describe how it works by commenting the source code:
function (.fun, .cols = true, ...) { # We check if .cols is not a function, since it is possible to supply a # predicate function. # if so, the .cols arguments will be "quoted", and filter() will # be a function that checks and evaluates these .cols within its other argument if (!is.function(.cols)) { .cols <- as.quoted(.cols) filter <- function(df) eval.quoted(.cols, df) } # otherwise, filter will be be Filter(), which applies the function # in .cols to every element of its other argument else { filter <- function(df) Filter(.cols, df) } # the ... arguments are caught in the list dots dots <- list(...) # a function is created, which will also be the return value. # it checks if its input is a data frame function(df, ...) { stopifnot(is.data.frame(df)) # if df is split (in "plyr" speaking), this will be taken into account... df <- strip_splits(df) # now the columns of the data frame are chosen, depending on the input of .cols # this can chosen directly, via a predicate function, or all columns (default) filtered <- filter(df) # if this means, that no columns are selected, an empty data frame will be returned if (length(filtered) == 0) return(data.frame()) # otherwise lapply will be called on all filtered columns, with # the .fun argument, which has to be provided by the user, and some other # arguments provided by the user, when calling the function (...) and # when defining the function (dots) out <- do.call("lapply", c(list(filtered, .fun, ...), dots)) # the output will be named and converted from list into a data frame again names(out) <- names(filtered) quickdf(out) } } <environment: namespace:plyr>
Q: Write FOs that convert a function to return a matrix instead of a data frame, or a data frame instead of a matrix. If you understand S3, call them
as.data.frame.function()
andas.matrix.function()
.A:
as.matrix.function <- function(f){ force(f) function(...){ as.matrix(f(...)) } } as.data.frame.function <- function(f){ force(f) function(...){ as.data.frame(f(...)) } }
Q: You’ve seen five functions that modify a function to change its output from one form to another. What are they? Draw a table of the various combinations of types of outputs: what should go in the rows and what should go in the columns? What function operators might you want to write to fill in the missing cells? Come up with example use cases.
Q: Look at all the examples of using an anonymous function to partially apply a function in this and the previous chapter. Replace the anonymous function with
partial()
. What do you think of the result? Is it easier or harder to read?A: The results are easy to read. Especially the
Map()
examples profit in readability:library(pryr) #> #> Attaching package: 'pryr' #> The following object is masked _by_ '.GlobalEnv': #> #> f ## From Functionals # 1 trims <- c(0, 0.1, 0.2, 0.5) x <- rcauchy(1000) unlist(lapply(trims, function(trim) mean(x, trim = trim))) #> [1] 0.193095665 0.005830929 0.028174694 -0.016268807 unlist(lapply(trims, partial(mean, x))) #> [1] 0.193095665 0.005830929 0.028174694 -0.016268807 # 2 xs <- replicate(5, runif(10), simplify = FALSE) ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE) unlist(Map(function(x, w) weighted.mean(x, w, na.rm = TRUE), xs, ws)) #> [1] 0.6666132 0.6010613 0.5124325 0.3757010 0.5497957 unlist(Map(partial(weighted.mean, na.rm = TRUE), xs, ws)) #> [1] 0.6666132 0.6010613 0.5124325 0.3757010 0.5497957 # 3 add <- function(x, y, na.rm = FALSE) { if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, 0) else x + y } r_add <- function(xs, na.rm = TRUE) { Reduce(function(x, y) add(x, y, na.rm = na.rm), xs) } r_add_compact <- function(xs, na.rm = TRUE) { Reduce(partial(add, na.rm = na.rm), xs) } r_add(1:4) #> [1] 10 r_add_compact(1:4) #> [1] 10 # 4 v_add1 <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y)) if (length(x) == 0) return(numeric()) simplify2array( Map(function(x, y) add(x, y, na.rm = na.rm), x, y) ) } v_add1_compact <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y)) if (length(x) == 0) return(numeric()) simplify2array( Map(partial(add, na.rm = na.rm), x, y) ) } v_add1(1:3, 2:4) #> [1] 3 5 7 v_add1_compact(1:3, 2:4) #> [1] 3 5 7 # 5 c_add <- function(xs, na.rm = FALSE) { Reduce(function(x, y) add(x, y, na.rm = na.rm), xs, accumulate = TRUE) } c_add_compact <- function(xs, na.rm = FALSE) { Reduce(partial(add, na.rm = na.rm), xs, accumulate = TRUE) } c_add(1:3) #> [1] 1 3 6 c_add_compact(1:3) #> [1] 1 3 6 ## From Function operators # 6 f <- function(x) x ^ 2 partial(f) #> function (...) #> f(...) # 7 # Map(function(x, y) f(x, y, zs), xs, ys) # Map(partial(f, zs = zs), xs, yz) # 8 # f <- function(a) g(a, b = 1) # f <- partial(g, b = 1) # 9 compact <- function(x) Filter(Negate(is.null), x) compact <- partial(Filter, Negate(is.null)) # 10 # Map(function(x, y) f(x, y, zs), xs, ys) # Map(partial(f, zs = zs), xs, ys) # 11 funs2 <- list( sum = function(...) sum(..., na.rm = TRUE), mean = function(...) mean(..., na.rm = TRUE), median = function(...) median(..., na.rm = TRUE) ) funs2 <- list( sum = partial(sum, na.rm = TRUE), mean = partial(mean, na.rm = TRUE), median = partial(median, na.rm = TRUE) )
9.4 Combining FOs
Q: Implement your own version of
compose()
usingReduce
and%o%
. For bonus points, do it without callingfunction
.A: We use the definition from the textbook:
compose <- function(f, g) { function(...) f(g(...)) } "%o%" <- compose
And then we build two versions. One via an anonymous function and one via
partial()
:compose_red <- function(fs) { Reduce(function(f, g) function(...) f(g(...)), fs) } compose_red(c(mean, length, unique))(1:10) #> [1] 10 compose_red_bonus <- function(fs) { Reduce(partial(partial(`%o%`)), fs) } compose_red_bonus(c(mean, length, unique))(1:10) #> [1] 10
Q: Extend
and()
andor()
to deal with any number of input functions. Can you do it withReduce()
? Can you keep them lazy (e.g., forand()
, the function returns once it sees the firstFALSE
)?A: We use
and()
andor()
as defined in the textbook. They are lazy, since they are build up on&&
and||
. Also their reduced versions stay lazy, as we will show at the end of the codeand <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) && f2(...) } } and_red <- function(fs){ Reduce(function(f, g) and(f, g), fs) } or <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) || f2(...) } } or_red <- function(fs){ Reduce(function(f, g) or(f, g), fs) } # Errors before the first TRUE will be returned tryCatch( or_red(c(is.logical, is.logical, stop, is.character))("a"), error = function(e) e ) #> <simpleError in f1(...): a> # Errors after the first TRUE won't be returned or_red(c(is.logical, is.logical, is.character, stop))("a") #> [1] TRUE
Q: Implement the
xor()
binary operator. Implement it using the existingxor()
function. Implement it as a combination ofand()
andor()
. What are the advantages and disadvantages of each approach? Also think about what you’ll call the resulting function to avoid a clash with the existingxor()
function, and how you might change the names ofand()
,not()
, andor()
to keep them consistent.A: Both versions are implemented straight forward, as also the reduced versions. However, the parallel versions need a little bit more care:
xor_fb1 <- function(f1, f2){ force(f1); force(f2) function(...){ xor(f1(...), f2(...)) } } xor_fb2 <- function(f1, f2){ force(f1); force(f2) function(...){ or(f1, f2)(...) && !(and(f1, f2)(...)) } } # binary combination xor_fb1(is.logical, is.character)("a") #> [1] TRUE xor_fb2(is.logical, is.character)("a") #> [1] TRUE # parallel combination (results in an error) xor_fb1(c(is.logical, is.character), c(is.logical, is.character))("a") #> Error in f1(...): could not find function "f1" xor_fb2(c(is.logical, is.character), c(is.logical, is.character))("a") #> Error in f1(...): could not find function "f1" # reduced combination (results in an error) xor_fb1(c(is.logical, is.character, is.logical, is.character))("a") #> Error in force(f2): argument "f2" is missing, with no default xor_fb2(c(is.logical, is.character, is.logical, is.character))("a") #> Error in force(f2): argument "f2" is missing, with no default ### Reduced version xor_fb1_red <- function(fs){ Reduce(function(f, g) xor_fb1(f, g), fs) } xor_fb2_red <- function(fs){ Reduce(function(f, g) xor_fb2(f, g), fs) } # should return TRUE xor_fb1_red(c(is.logical, is.character, is.logical, is.character))("a") #> [1] FALSE xor_fb2_red(c(is.logical, is.character, is.logical, is.character))("a") #> [1] FALSE # should return FALSE xor_fb1_red(c(is.logical, is.logical, is.character, is.logical))("a") #> [1] TRUE xor_fb2_red(c(is.logical, is.logical, is.character, is.logical))("a") #> [1] TRUE # should return FALSE xor_fb1_red(c(is.logical, is.logical, is.character, is.character))("a") #> [1] FALSE xor_fb2_red(c(is.logical, is.logical, is.character, is.character))("a") #> [1] FALSE
Q: Above, we implemented boolean algebra for functions that return a logical function. Implement elementary algebra (
plus()
,minus()
,multiply()
,divide()
,exponentiate()
,log()
) for functions that return numeric vectors.A:
plus <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) + f2(...) } } minus <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) - f2(...) } } multiply <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) * f2(...) } } divide <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) / f2(...) } } exponentiate <- function(f1, f2) { force(f1); force(f2) function(...) { f1(...) ^ f2(...) } } # we rename log to log_ since log() already exists log_ <- function(f1, f2) { force(f1); force(f2) function(...) { log(f1(...), f2(...)) } } # Test mns <- minus(mean, function(x) x^2) mns(1:5)