9 Function operators

9.1 Behavioural FOs

Q: Write a FO that logs a time stamp and message to a file every time a function is run.

A: Note that the example will create a file file in your current working directory:

logger <- function(f, filename){
  force(f)
  filename_tmp <- paste(filename, basename(tempfile()), sep = "_")
  write(paste("created at:", Sys.time()), filename_tmp, append = TRUE)
  function(..., message = "you can add a message at each call") {
    write(paste0("used at: ", Sys.time(), ", ", message), filename_tmp, append = TRUE)
    f(...)
  }
}

# the following line creates a file, which name starts with "mean_log_"
mean2 <- logger(mean, "mean_log") 
mean2(1:4, message = "first time") 
mean2(1:4, message = "second_time")

Q: What does the following function do? What would be a good name for it?

f <- function(g) {
  force(g)
  result <- NULL
  function(...) {
    if (is.null(result)) {
      result <<- g(...)
    }
    result
  }
}
runif2 <- f(runif)
runif2(5)
#> [1] 0.1648604 0.7923487 0.0473985 0.5719584 0.5383457
runif2(10)
#> [1] 0.1648604 0.7923487 0.0473985 0.5719584 0.5383457

A: It returns a new version of the inputfunction. That version will always return the result of it’s first run (in case this not NULL), no matter how the input changes. Good names could be first_run() or initial_return().

Q: Modify delay_by() so that instead of delaying by a fixed amount of time, it ensures that a certain amount of time has elapsed since the function was last called. That is, if you called g <- delay_by(1, f); g(); Sys.sleep(2); g() there shouldn’t be an extra delay.

A: We can do this with three little tricks (and the help of 42):

delay_by_v2 <- function(delay, f) {
  force(f)
  # we initialise the timestamp for the last run. We set a specific default value,
  # to ensure that the first run of the returned function will never be delayed
  last_runtime <- Sys.time() - (delay + 42)
  function(...) {
    # we continually check if enough time passed with an (empty) while statement.
    while (Sys.time() < last_runtime + delay) {}
    # we override the start for the next waiting interval.
    # Note that this is done on exit (after the function is evaluated)
    on.exit(last_runtime <<- Sys.time()) 
    return(f(...))
  }
}

Alternatively to the empty while statement we could have used Sys.sleep(). I would not recommend this solution, since ?Sys.sleep indicates that Sys.sleep() might have some overhead and seems not to be as exact as we need.

Q: Write wait_until() which delays execution until a specific time.

wait_until <- function(time, f) {
  force(f)
  function(...) {
    while (Sys.time() < time) {}
    return(f(...))
  }
}

# a little test
ptm <- proc.time()
m <- wait_until(Sys.time() + 10, mean)
m(1:3)
proc.time() - ptm

Q: There are three places we could have added a memoise call: why did we choose the one we did?
```
download <- memoise(dot_every(10, delay_by(1, download_file)))
download <- dot_every(10, memoise(delay_by(1, download_file)))
download <- dot_every(10, delay_by(1, memoise(download_file)))
```
A: The second was chosen. It’s easy to see why, if we eliminate the other two options:
- The first version only prints a dot at every tenth download() call with a new input. This is because dot_every() is inside of memoise() and the counter created by dot_every() is not “activated” if the input is known.
- The third version takes one second for every call. Even if we already know the result and don’t download anything again.
Q: Why is the remember() function inefficient? How could you implement it in more efficient way?
Q: Why does the following code, from stackoverflow, not do what you expect?
```
# return a linear function with slope a and intercept b.
f <- function(a, b) function(x) a * x + b

# create a list of functions with different parameters.
fs <- Map(f, a = c(0, 1), b = c(0, 1))

fs[[1]](3)
#> [1] 0
# should return 0 * 3 + 0 = 0
```
How can you modify f so that it works correctly?

A: You can read in the stackoverflow link that the question arose, because the original return of fs[[1]](3) was 4, which is due to lazy evaluation and could be solved by two users via force():
```
f <- function(a, b) {force(a); force(b); function(x) a * x + b}
```
However you can see in the result within the question that R’s behaviour was changed in this case and as Jan Kislinger points out on twitter:

The real question should be: “How did they modify #rstats so that it works correctly?” otherwise it’s a tricky question :D

Note that the same issue appears in the textbook:

In the following example, we take a list of functions and delay each one. But when we try to evaluate the mean, we get the sum instead.
```
funs <- list(mean = mean, sum = sum)
funs_m <- lapply(funs, delay_by, delay = 0.1)

funs_m$mean(1:10)
#> [1] 5.5
```
Which (as one can see) is not true anymore…actually it changed in R version 3.2:

Higher order functions such as the apply functions and Reduce() now force arguments to the functions they apply in order to eliminate undesirable interactions between lazy evaluation and variable capture in closures. This resolves PR#16093.

For further interested: PR#16093 will lead you to the subject “iterated lapply” within the R-devel Archives. Note that the behaviour in for loops is still as “the old lapply()” behaviour.

9.2 Output FOs

Q: Create a negative() FO that flips the sign of the output of the function to which it is applied.

A:
```
negative <- function(f){
  force(f)
  function(...){
    -f(...)
  }
}
```

Q: The evaluate package makes it easy to capture all the outputs (results, text, messages, warnings, errors, and plots) from an expression. Create a function like capture_it() that also captures the warnings and errors generated by a function.

A: One way is just to capture the output of tryCatch() with identity handlers for errors and warnings:

capture_trials <- function(f){
  force(f)
  function(...){
    capture.output(tryCatch(f(...),
                            error = function(e) e,
                            warning = function(w) w)
    )
  }
}

# we test the behaviour
log_t <- capture_trials(log)
elements <- list(1:10, c(-1, 10), c(TRUE, FALSE), letters)
results <- lapply(elements, function(x) log_t(x))
results
#> [[1]]
#> [1] " [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101"
#> [2] " [8] 2.0794415 2.1972246 2.3025851"                                        
#> 
#> [[2]]
#> [1] "<simpleWarning in f(...): NaNs produced>"
#> 
#> [[3]]
#> [1] "[1]    0 -Inf"
#> 
#> [[4]]
#> [1] "<simpleError in f(...): non-numeric argument to mathematical function>"

# further
# results_detailed <- lapply(elements, function(x) lapply(x, function(y))log2(x))
# results_detailed

Q: Create a FO that tracks files created or deleted in the working directory (Hint: use dir() and setdiff().) What other global effects of functions might you want to track?

A: We start with a short version to show the idea:

track_dir <- function(f){
  force(f)
  function(...){
    dir_old <- dir()
    on.exit(if(!setequal(dir(), dir_old)){
      message("files in your working directory were deleted or added by this function")})
    f(...)
  }
}

# the following test will create the file "delete_me" in your working directory
td <- track_dir(dir.create)
td("delete_me")

Of course we can provide more information on the type of changes:

track_dir <- function(f){
  force(f)
  function(...){
    dir_old <- dir()

    on.exit(if(!setequal(dir(), dir_old)){
      message("Files in your working directory were deleted or added by this
              function.")}, add = TRUE)
    on.exit(if(length(setdiff(dir_old, dir()) != 0)){
      message(paste0("The following files were deleted: ",
                     paste(setdiff(dir_old, dir()), collapse = ", ")
                     ))}, add = TRUE)
    on.exit(if(length(setdiff(dir(), dir_old) != 0)){
      message(paste0("The following files were added: ", 
                     paste(setdiff(dir(), dir_old), collapse = ", ")
                     ))}, add = TRUE)

    f(...)
  }
}

# the following test will again create two files in your working directory
td <- track_dir(sapply)
td(c("delete_me", "me_too"), dir.create)

Other global effects that might be worth tracking include changes regarding:

the search path and/or introduced conflicts()
options() and par() which modify global settings
the path of the working directory
environment variables
the locale.

9.3 Input FOs

Q: Our previous download() function only downloads a single file. How can you use partial() and lapply() to create a function that downloads multiple files at once? What are the pros and cons of using partial() vs. writing a function by hand?

Q: Read the source code for plyr::colwise(). How does the code work? What are colwise()’s three main tasks? How could you make colwise() simpler by implementing each task as a function operator? (Hint: think about partial().)

A: We describe how it works by commenting the source code:

function (.fun, .cols = true, ...) 
  {
  # We check if .cols is not a function, since it is possible to supply a
  # predicate function.
  # if so, the .cols arguments will be "quoted", and filter() will 
  # be a function that checks and evaluates these .cols within its other argument
  if (!is.function(.cols)) {
    .cols <- as.quoted(.cols)
    filter <- function(df) eval.quoted(.cols, df)
  }
  # otherwise, filter will be be Filter(), which applies the function 
  # in .cols to every element of its other argument
  else {
    filter <- function(df) Filter(.cols, df)
  }
  # the ... arguments are caught in the list dots
  dots <- list(...)
  # a function is created, which will also be the return value.
  # it checks if its input is a data frame
  function(df, ...) {
    stopifnot(is.data.frame(df))
    # if df is split (in "plyr" speaking), this will be taken into account...
    df <- strip_splits(df)
    # now the columns of the data frame are chosen, depending on the input of .cols
    # this can chosen directly, via a predicate function, or all columns (default)
    filtered <- filter(df)
    # if this means, that no columns are selected, an empty data frame will be returned
    if (length(filtered) == 0) 
      return(data.frame())
    # otherwise lapply will be called on all filtered columns, with 
    # the .fun argument, which has to be provided by the user, and some other
    # arguments provided by the user, when calling the function (...) and
    # when defining the function (dots)
    out <- do.call("lapply", c(list(filtered, .fun, ...), 
                           dots))
    # the output will be named and converted from list into a data frame again
    names(out) <- names(filtered)
    quickdf(out)
  }
}

<environment: namespace:plyr>

Q: Write FOs that convert a function to return a matrix instead of a data frame, or a data frame instead of a matrix. If you understand S3, call them as.data.frame.function() and as.matrix.function().

as.matrix.function <- function(f){
  force(f)
  function(...){
    as.matrix(f(...))
  }
}

as.data.frame.function <- function(f){
  force(f)
  function(...){
    as.data.frame(f(...))
  }
}

Q: You’ve seen five functions that modify a function to change its output from one form to another. What are they? Draw a table of the various combinations of types of outputs: what should go in the rows and what should go in the columns? What function operators might you want to write to fill in the missing cells? Come up with example use cases.

Q: Look at all the examples of using an anonymous function to partially apply a function in this and the previous chapter. Replace the anonymous function with partial(). What do you think of the result? Is it easier or harder to read?

A: The results are easy to read. Especially the Map() examples profit in readability:

library(pryr)
#> 
#> Attaching package: 'pryr'
#> The following object is masked _by_ '.GlobalEnv':
#> 
#>     f
## From Functionals
# 1
trims <- c(0, 0.1, 0.2, 0.5)
x <- rcauchy(1000)
unlist(lapply(trims, function(trim) mean(x, trim = trim)))
#> [1]  0.193095665  0.005830929  0.028174694 -0.016268807
unlist(lapply(trims, partial(mean, x)))
#> [1]  0.193095665  0.005830929  0.028174694 -0.016268807

# 2
xs <- replicate(5, runif(10), simplify = FALSE)
ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
unlist(Map(function(x, w) weighted.mean(x, w, na.rm = TRUE), xs, ws))
#> [1] 0.6666132 0.6010613 0.5124325 0.3757010 0.5497957
unlist(Map(partial(weighted.mean, na.rm = TRUE), xs, ws))
#> [1] 0.6666132 0.6010613 0.5124325 0.3757010 0.5497957

# 3
add <- function(x, y, na.rm = FALSE) {
  if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, 0) else x + y
}

r_add <- function(xs, na.rm = TRUE) {
  Reduce(function(x, y) add(x, y, na.rm = na.rm), xs)
}

r_add_compact <- function(xs, na.rm = TRUE) {
  Reduce(partial(add, na.rm = na.rm), xs)
}

r_add(1:4)
#> [1] 10
r_add_compact(1:4)
#> [1] 10

# 4
v_add1 <- function(x, y, na.rm = FALSE) {
  stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y))
  if (length(x) == 0) return(numeric())
  simplify2array(
    Map(function(x, y) add(x, y, na.rm = na.rm), x, y)
  )
}

v_add1_compact <- function(x, y, na.rm = FALSE) {
  stopifnot(length(x) == length(y), is.numeric(x), is.numeric(y))
  if (length(x) == 0) return(numeric())
  simplify2array(
    Map(partial(add, na.rm = na.rm), x, y)
  )
}

v_add1(1:3, 2:4)
#> [1] 3 5 7
v_add1_compact(1:3, 2:4)
#> [1] 3 5 7

# 5
c_add <- function(xs, na.rm = FALSE) {
  Reduce(function(x, y) add(x, y, na.rm = na.rm), xs,
         accumulate = TRUE)
}

c_add_compact <- function(xs, na.rm = FALSE) {
  Reduce(partial(add, na.rm = na.rm), xs,
         accumulate = TRUE)
}

c_add(1:3)
#> [1] 1 3 6
c_add_compact(1:3)
#> [1] 1 3 6

## From Function operators
# 6
f <- function(x) x ^ 2
partial(f)
#> function (...) 
#> f(...)

# 7
# Map(function(x, y) f(x, y, zs), xs, ys)
# Map(partial(f, zs = zs), xs, yz)

# 8
# f <- function(a) g(a, b = 1)
# f <- partial(g, b = 1)

# 9
compact <- function(x) Filter(Negate(is.null), x)
compact <- partial(Filter, Negate(is.null))

# 10
# Map(function(x, y) f(x, y, zs), xs, ys)
# Map(partial(f, zs = zs), xs, ys)

# 11
funs2 <- list(
  sum = function(...) sum(..., na.rm = TRUE),
  mean = function(...) mean(..., na.rm = TRUE),
  median = function(...) median(..., na.rm = TRUE)
)

funs2 <- list(
  sum = partial(sum, na.rm = TRUE),
  mean = partial(mean, na.rm = TRUE),
  median = partial(median, na.rm = TRUE)
)

9.4 Combining FOs

Q: Implement your own version of compose() using Reduce and %o%. For bonus points, do it without calling function.

A: We use the definition from the textbook:

compose <- function(f, g) {
  function(...) f(g(...))
}

"%o%" <- compose

And then we build two versions. One via an anonymous function and one via partial():

compose_red <- function(fs) {
  Reduce(function(f, g) function(...) f(g(...)), fs)
}
compose_red(c(mean, length, unique))(1:10)
#> [1] 10

compose_red_bonus <- function(fs) {
  Reduce(partial(partial(`%o%`)), fs)
}
compose_red_bonus(c(mean, length, unique))(1:10)
#> [1] 10

Q: Extend and() and or() to deal with any number of input functions. Can you do it with Reduce()? Can you keep them lazy (e.g., for and(), the function returns once it sees the first FALSE)?

A: We use and() and or() as defined in the textbook. They are lazy, since they are build up on && and ||. Also their reduced versions stay lazy, as we will show at the end of the code

and <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) && f2(...)
  }
}

and_red <- function(fs){
  Reduce(function(f, g) and(f, g), fs)
}

or <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) || f2(...)
  }
}

or_red <- function(fs){
  Reduce(function(f, g) or(f, g), fs)
}

# Errors before the first TRUE will be returned
tryCatch(
  or_red(c(is.logical, is.logical, stop, is.character))("a"), 
  error = function(e) e
)
#> <simpleError in f1(...): a>

# Errors after the first TRUE won't be returned
or_red(c(is.logical, is.logical, is.character, stop))("a")
#> [1] TRUE

Q: Implement the xor() binary operator. Implement it using the existing xor() function. Implement it as a combination of and() and or(). What are the advantages and disadvantages of each approach? Also think about what you’ll call the resulting function to avoid a clash with the existing xor() function, and how you might change the names of and(), not(), and or() to keep them consistent.

A: Both versions are implemented straight forward, as also the reduced versions. However, the parallel versions need a little bit more care:

xor_fb1 <- function(f1, f2){
  force(f1); force(f2)
  function(...){
    xor(f1(...), f2(...)) 
  }
}

xor_fb2 <- function(f1, f2){
  force(f1); force(f2)
  function(...){
    or(f1, f2)(...) && !(and(f1, f2)(...))
  }
}

# binary combination
xor_fb1(is.logical, is.character)("a")
#> [1] TRUE
xor_fb2(is.logical, is.character)("a")
#> [1] TRUE

# parallel combination (results in an error)
xor_fb1(c(is.logical, is.character), c(is.logical, is.character))("a")
#> Error in f1(...): could not find function "f1"
xor_fb2(c(is.logical, is.character), c(is.logical, is.character))("a")
#> Error in f1(...): could not find function "f1"

# reduced combination (results in an error)
xor_fb1(c(is.logical, is.character, is.logical, is.character))("a")
#> Error in force(f2): argument "f2" is missing, with no default
xor_fb2(c(is.logical, is.character, is.logical, is.character))("a")
#> Error in force(f2): argument "f2" is missing, with no default

### Reduced version
xor_fb1_red <- function(fs){
  Reduce(function(f, g) xor_fb1(f, g), fs)
}

xor_fb2_red <- function(fs){
  Reduce(function(f, g) xor_fb2(f, g), fs)
}

# should return TRUE
xor_fb1_red(c(is.logical, is.character, is.logical, is.character))("a")
#> [1] FALSE
xor_fb2_red(c(is.logical, is.character, is.logical, is.character))("a")
#> [1] FALSE

# should return FALSE
xor_fb1_red(c(is.logical, is.logical, is.character, is.logical))("a")
#> [1] TRUE
xor_fb2_red(c(is.logical, is.logical, is.character, is.logical))("a")
#> [1] TRUE

# should return FALSE
xor_fb1_red(c(is.logical, is.logical, is.character, is.character))("a")
#> [1] FALSE
xor_fb2_red(c(is.logical, is.logical, is.character, is.character))("a")
#> [1] FALSE

Q: Above, we implemented boolean algebra for functions that return a logical function. Implement elementary algebra (plus(), minus(), multiply(), divide(), exponentiate(), log()) for functions that return numeric vectors.

plus <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) + f2(...)
  }
}

minus <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) - f2(...)
  }
}

multiply <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) * f2(...)
  }
}

divide <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) / f2(...)
  }
}

exponentiate <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    f1(...) ^ f2(...)
  }
}

# we rename log to log_ since log() already exists
log_ <- function(f1, f2) {
  force(f1); force(f2)
  function(...) {
    log(f1(...), f2(...))
  }
}

# Test
mns <- minus(mean, function(x) x^2)
mns(1:5)