6 Exceptions and debugging

6.1 Condition handling

Q: Compare the following two implementations of message2error(). What is the main advantage of withCallingHandlers() in this scenario? (Hint: look carefully at the traceback.)

message2error <- function(code) {
  withCallingHandlers(code, message = function(e) stop(e))
}
message2error <- function(code) {
  tryCatch(code, message = function(e) stop(e))
}

6.2 Defensive programming

Q: The goal of the col_means() function defined below is to compute the means of all numeric columns in a data frame.

col_means <- function(df) {
  numeric <- sapply(df, is.numeric)
  numeric_cols <- df[, numeric]

  data.frame(lapply(numeric_cols, mean))
}

However, the function is not robust to unusual inputs. Look at the following results, decide which ones are incorrect, and modify col_means() to be more robust. (Hint: there are two function calls in col_means() that are particularly prone to problems.)

col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = F])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))

mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)

A: We divide the tests according their input types and look at the results:

# data.frame input
col_means(mtcars) # correct (return: 1 row numeric data.frame)
col_means(mtcars[, 0]) # incorrect (return: error. An empty data.frame would be better)
col_means(mtcars[0, ])[[1]] # correct (return: 1 row numeric (NaN) data.frame, 
  # which becomes an atomic after subsetting
col_means(mtcars[, "mpg", drop = F]) # incorrect (returns complete the numeric column as a 
  # data.frame with new names)

# other input
col_means(1:10) # correct (specific error)
col_means(as.matrix(mtcars)) # correct (specific error)
col_means(as.list(mtcars)) # correct (specific error)

# data.fame (numeric + character columns)
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2) # incorrect (returns the complete numeric column as a data.frame
  # with new names)

We can make the following changes:

col_means2 <- function(df) {
#  stopifnot(is.data.frame(df))
numeric <- vapply(df, is.numeric, logical(1)) # sapply() to vapply()
numeric_cols <- df[, numeric, drop = FALSE] # add drop = FALSE
data.frame(lapply(numeric_cols, mean))
}

And look at the tests again:

# data.frame input
col_means2(mtcars) # correct
col_means2(mtcars[, 0]) # correct
col_means2(mtcars[0, ])[[1]] # correct
col_means2(mtcars[, "mpg", drop = F]) #correct

# other input
col_means2(1:10) # correct (specific error)
col_means2(as.matrix(mtcars)) # correct (specific error)
col_means2(as.list(mtcars)) # correct (specific error)

# data.frame (numeric + character columns)
col_means2(mtcars2) # correct

For faster failing in the error cases (where the input is not a data.frame), we can add stopifnot(is.data.frame(df)) in the first line of col_means().

Q: The following function “lags” a vector, returning a version of x that is n values behind the original. Improve the function so that it (1) returns a useful error message if n is not a vector, and (2) has reasonable behaviour when n is 0 or longer than x.

lag <- function(x, n = 1L) {
  xlen <- length(x)
  c(rep(NA, n), x[seq_len(xlen - n)])
}

A: First we test lag()’s actual bahaviour:

v <- 1:3
lag(v, 1) # -> NA 1 2  ; correct
lag(v, 0) # -> 1 2 3   ; correct
lag(v, 3) # -> NA NA NA; correct
lag(v, 4) # -> Error in seq_len(xlen - n) : 
  # argument must be coercible to non-negative integer
  # in my opinion the result should be NA NA NA
lag(v, iris) # -> Error in lag(v, iris) : (list) object cannot be coerced to type 'integer'

We can adjust lag(), to get better bahaviour in the latter two error cases:

lag2 <- function(x, n = 1L) {
  # we make sure that a double or integer is supplied
  if (!is.numeric(n)) stop("n is not a numeric vector") 
  xlen <- length(x)
  # and we change n so that xlen - n will become 0 in case of n > length(x)
  # to get the desired behaviour for lag(v, 4)
  n <- min(xlen, n)
  c(rep(NA, n), x[seq_len(xlen - n)])
}

Now we look again at the tests:

lag2(v, 1) # -> correct
lag2(v, 0) # -> correct
lag2(v, 3) # -> correct
lag2(v, 4) # -> NA NA NA; correct
lag2(v, iris) # -> Error in lag2(v, iris) : n is not a numeric vector

Note that we didn’t test/specify lag()’s behaviour for negative, decimal or length(n) != 1 inputs of n.