6 Exceptions and debugging
6.1 Condition handling
Q: Compare the following two implementations of
message2error()
. What is the main advantage ofwithCallingHandlers()
in this scenario? (Hint: look carefully at the traceback.)message2error <- function(code) { withCallingHandlers(code, message = function(e) stop(e)) } message2error <- function(code) { tryCatch(code, message = function(e) stop(e)) }
A:
6.2 Defensive programming
Q: The goal of the
col_means()
function defined below is to compute the means of all numeric columns in a data frame.col_means <- function(df) { numeric <- sapply(df, is.numeric) numeric_cols <- df[, numeric] data.frame(lapply(numeric_cols, mean)) }
However, the function is not robust to unusual inputs. Look at the following results, decide which ones are incorrect, and modify
col_means()
to be more robust. (Hint: there are two function calls incol_means()
that are particularly prone to problems.)col_means(mtcars) col_means(mtcars[, 0]) col_means(mtcars[0, ]) col_means(mtcars[, "mpg", drop = F]) col_means(1:10) col_means(as.matrix(mtcars)) col_means(as.list(mtcars)) mtcars2 <- mtcars mtcars2[-1] <- lapply(mtcars2[-1], as.character) col_means(mtcars2)
A: We divide the tests according their input types and look at the results:
# data.frame input col_means(mtcars) # correct (return: 1 row numeric data.frame) col_means(mtcars[, 0]) # incorrect (return: error. An empty data.frame would be better) col_means(mtcars[0, ])[[1]] # correct (return: 1 row numeric (NaN) data.frame, # which becomes an atomic after subsetting col_means(mtcars[, "mpg", drop = F]) # incorrect (returns complete the numeric column as a # data.frame with new names) # other input col_means(1:10) # correct (specific error) col_means(as.matrix(mtcars)) # correct (specific error) col_means(as.list(mtcars)) # correct (specific error) # data.fame (numeric + character columns) mtcars2 <- mtcars mtcars2[-1] <- lapply(mtcars2[-1], as.character) col_means(mtcars2) # incorrect (returns the complete numeric column as a data.frame # with new names)
We can make the following changes:
col_means2 <- function(df) { # stopifnot(is.data.frame(df)) numeric <- vapply(df, is.numeric, logical(1)) # sapply() to vapply() numeric_cols <- df[, numeric, drop = FALSE] # add drop = FALSE data.frame(lapply(numeric_cols, mean)) }
And look at the tests again:
# data.frame input col_means2(mtcars) # correct col_means2(mtcars[, 0]) # correct col_means2(mtcars[0, ])[[1]] # correct col_means2(mtcars[, "mpg", drop = F]) #correct # other input col_means2(1:10) # correct (specific error) col_means2(as.matrix(mtcars)) # correct (specific error) col_means2(as.list(mtcars)) # correct (specific error) # data.frame (numeric + character columns) col_means2(mtcars2) # correct
For faster failing in the error cases (where the input is not a data.frame), we can add
stopifnot(is.data.frame(df))
in the first line ofcol_means()
.Q: The following function “lags” a vector, returning a version of
x
that isn
values behind the original. Improve the function so that it (1) returns a useful error message ifn
is not a vector, and (2) has reasonable behaviour whenn
is 0 or longer thanx
.lag <- function(x, n = 1L) { xlen <- length(x) c(rep(NA, n), x[seq_len(xlen - n)]) }
A: First we test
lag()
’s actual bahaviour:v <- 1:3 lag(v, 1) # -> NA 1 2 ; correct lag(v, 0) # -> 1 2 3 ; correct lag(v, 3) # -> NA NA NA; correct lag(v, 4) # -> Error in seq_len(xlen - n) : # argument must be coercible to non-negative integer # in my opinion the result should be NA NA NA lag(v, iris) # -> Error in lag(v, iris) : (list) object cannot be coerced to type 'integer'
We can adjust
lag()
, to get better bahaviour in the latter two error cases:lag2 <- function(x, n = 1L) { # we make sure that a double or integer is supplied if (!is.numeric(n)) stop("n is not a numeric vector") xlen <- length(x) # and we change n so that xlen - n will become 0 in case of n > length(x) # to get the desired behaviour for lag(v, 4) n <- min(xlen, n) c(rep(NA, n), x[seq_len(xlen - n)]) }
Now we look again at the tests:
lag2(v, 1) # -> correct lag2(v, 0) # -> correct lag2(v, 3) # -> correct lag2(v, 4) # -> NA NA NA; correct lag2(v, iris) # -> Error in lag2(v, iris) : n is not a numeric vector
Note that we didn’t test/specify
lag()
’s behaviour for negative, decimal orlength(n) != 1
inputs ofn
.