17 Deprecated
17.1 Conditions
Q: What does
options(error = recover)
do? Why might you use it?A: In case of
options(error = recover)
utils::recover()
will be called (without arguments) in case of an error. This will print out a list of calls which precede the error and lets the user choose to incorporatebrowser()
directly in any of the regarding environments allowing a practical mode for debugging.Q: What does
options(error = quote(dump.frames(to.file = TRUE)))
do? Why might you use it?A: This option writes a dump of the evaluation environment where an error occurs into a file ending on
.rda
. When this option is set, R will continue to run after the first error. To stop R at the first error usequote({dump.frames(to.file=TRUE); q()})
. These options are especially useful for debugging non-interactive R scripts afterwards (“post mortem debugging”).
17.2 Expressions (new)
Q:
base::alist()
is useful for creating pairlists to be used for function arguments:foo <- function() {} formals(foo) <- alist(x = , y = 1) foo #> function (x, y = 1) #> { #> }
What makes
alist()
special compared tolist()
?A: From
?alist
:alist handles its arguments as if they described function arguments. So the values are not evaluated, and tagged arguments with no value are allowed whereas list simply ignores them. alist is most often used in conjunction with formals.
17.3 Functionals
17.3.1 My first functional: lapply()
Q: Why are the following two invocations of
lapply()
equivalent?trims <- c(0, 0.1, 0.2, 0.5) x <- rcauchy(100) lapply(trims, function(trim) mean(x, trim = trim)) lapply(trims, mean, x = x)
A: In the first statement each element of
trims
is explicitly supplied tomean()
’s second argument. In the latter statement this happens via positional matching, sincemean()
’s first argument is supplied via name inlapply()
’s third argument (...
).Q: The function below scales a vector so it falls in the range [0, 1]. How would you apply it to every column of a data frame? How would you apply it to every numeric column in a data frame?
scale01 <- function(x) { rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) }
A: Since this function needs numeric input, one can check this via an if clause. If one also wants to return non-numeric input columns, these can be supplied to the
else
argument of theif()
“function”:data.frame(lapply(iris, function(x) if (is.numeric(x)) scale01(x) else x))
Q: Use both for loops and
lapply()
to fit linear models to themtcars
using the formulas stored in this list:formulas <- list( mpg ~ disp, mpg ~ I(1 / disp), mpg ~ disp + wt, mpg ~ I(1 / disp) + wt )
A: Like in the first exercise, we can create two
lapply()
versions:# lapply (2 versions) la1 <- lapply(formulas, lm, data = mtcars) la2 <- lapply(formulas, function(x) lm(formula = x, data = mtcars)) # for loop lf1 <- vector("list", length(formulas)) for (i in seq_along(formulas)){ lf1[[i]] <- lm(formulas[[i]], data = mtcars) }
Note that all versions return the same content, but they won’t be identical, since the values of the “call” element will differ between each version.
Q: Fit the model
mpg ~ disp
to each of the bootstrap replicates ofmtcars
in the list below by using a for loop andlapply()
. Can you do it without an anonymous function?bootstraps <- lapply(1:10, function(i) { rows <- sample(1:nrow(mtcars), rep = TRUE) mtcars[rows, ] })
A:
# lapply without anonymous function la <- lapply(bootstraps, lm, formula = mpg ~ disp) # for loop lf <- vector("list", length(bootstraps)) for (i in seq_along(bootstraps)){ lf[[i]] <- lm(mpg ~ disp, data = bootstraps[[i]]) }
Q: For each model in the previous two exercises, extract \(R^2\) using the function below.
rsq <- function(mod) summary(mod)$r.squared
A: For the models in exercise 3:
sapply(la1, rsq) #> [1] 0.7183433 0.8596865 0.7809306 0.8838038 sapply(la2, rsq) #> [1] 0.7183433 0.8596865 0.7809306 0.8838038 sapply(lf1, rsq) #> [1] 0.7183433 0.8596865 0.7809306 0.8838038
And the models in exercise 4:
sapply(la, rsq) #> [1] 0.7613622 0.7300040 0.7096029 0.7971209 0.7709383 0.6967571 0.8371663 #> [8] 0.7189694 0.7286141 0.6194394 sapply(lf, rsq) #> [1] 0.7613622 0.7300040 0.7096029 0.7971209 0.7709383 0.6967571 0.8371663 #> [8] 0.7189694 0.7286141 0.6194394
17.3.2 For loops functionals: friends of lapply():
Q: Use
vapply()
to:Compute the standard deviation of every column in a numeric data frame.
Compute the standard deviation of every numeric column in a mixed data frame. (Hint: you’ll need to use
vapply()
twice.)
A: As a numeric
data.frame
we choosecars
:vapply(cars, sd, numeric(1))
And as a mixed
data.frame
we chooseiris
:vapply(iris[vapply(iris, is.numeric, logical(1))], sd, numeric(1))
Q: Why is using
sapply()
to get theclass()
of each element in a data frame dangerous?A: Columns of data.frames might have more than one class, so the class of
sapply()
’s output may differ from time to time (silently). If …- all columns have one class:
sapply()
returns a character vector - one column has more classes than the others:
sapply()
returns a list - all columns have the same number of classes, which is more than one:
sapply()
returns a matrix
For example:
a <- letters[1:3] class(a) <- c("class1", "class2") df <- data.frame(a = character(3)) df$a <- a df$b <- a class(sapply(df, class)) #> [1] "matrix"
Note that this case often appears, wile working with the POSIXt types, POSIXct and POSIXlt.
- all columns have one class:
Q: The following code simulates the performance of a t-test for non-normal data. Use
sapply()
and an anonymous function to extract the p-value from every trial.trials <- replicate( 100, t.test(rpois(10, 10), rpois(7, 10)), simplify = FALSE )
Extra challenge: get rid of the anonymous function by using
[[
directly.A:
# anonymous function: sapply(trials, function(x) x[["p.value"]]) # without anonymous function: sapply(trials, "[[", "p.value")
Q: What does
replicate()
do? What sort of for loop does it eliminate? Why do its arguments differ fromlapply()
and friends?A: As stated in
?replicate
:replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).
We can see this clearly in the source code:
#> function (n, expr, simplify = "array") #> sapply(integer(n), eval.parent(substitute(function(...) expr)), #> simplify = simplify) #> <bytecode: 0x55fd08512b78> #> <environment: namespace:base>
Like
sapply()
replicate()
eliminates a for loop. As explained forMap()
in the textbook, also everyreplicate()
could have been written vialapply()
. But usingreplicate()
is more concise, and more clearly indicates what you’re trying to do.Q: Implement a version of
lapply()
that suppliesFUN
with both the name and the value of each component.A:
lapply_nms <- function(X, FUN, ...){ Map(FUN, X, names(X), ...) } lapply_nms(iris, function(x, y) c(class(x), y)) #> $Sepal.Length #> [1] "numeric" "Sepal.Length" #> #> $Sepal.Width #> [1] "numeric" "Sepal.Width" #> #> $Petal.Length #> [1] "numeric" "Petal.Length" #> #> $Petal.Width #> [1] "numeric" "Petal.Width" #> #> $Species #> [1] "factor" "Species"
Q: Implement a combination of
Map()
andvapply()
to create anlapply()
variant that iterates in parallel over all of its inputs and stores its outputs in a vector (or a matrix). What arguments should the function take?A As we understand this exercise, it is about working with a list of lists, like in the following example:
testlist <- list(iris, mtcars, cars) lapply(testlist, function(x) vapply(x, mean, numeric(1))) #> Warning in mean.default(X[[i]], ...): argument is not numeric or logical: #> returning NA #> [[1]] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 5.843333 3.057333 3.758000 1.199333 NA #> #> [[2]] #> mpg cyl disp hp drat wt #> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 #> qsec vs am gear carb #> 17.848750 0.437500 0.406250 3.687500 2.812500 #> #> [[3]] #> speed dist #> 15.40 42.98
So we can get the same result with a more specialized function:
lmapply <- function(X, FUN, FUN.VALUE, simplify = FALSE){ out <- Map(function(x) vapply(x, FUN, FUN.VALUE), X) if(simplify == TRUE){return(simplify2array(out))} out } lmapply(testlist, mean, numeric(1)) #> Warning in mean.default(X[[i]], ...): argument is not numeric or logical: #> returning NA #> [[1]] #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 5.843333 3.057333 3.758000 1.199333 NA #> #> [[2]] #> mpg cyl disp hp drat wt #> 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 #> qsec vs am gear carb #> 17.848750 0.437500 0.406250 3.687500 2.812500 #> #> [[3]] #> speed dist #> 15.40 42.98
Q: Implement
mcsapply()
, a multi-core version ofsapply()
. Can you implementmcvapply()
, a parallel version ofvapply()
? Why or why not?
17.3.3 Manipulating matrices and data frames
Q: How does
apply()
arrange the output? Read the documentation and perform some experiments.A:
apply()
arranges its output columns (or list elements) according to the order of the margin. The rows are ordered by the other dimensions, starting with the “last” dimension of the input object. What this means should become clear by looking at the three and four dimensional cases of the following example:# for two dimensional cases everything is sorted by the other dimension arr2 <- array(1:9, dim = c(3, 3), dimnames = list(paste0("row", 1:3), paste0("col", 1:3))) arr2 apply(arr2, 1, head, 1) # Margin is row apply(arr2, 1, head, 9) # sorts by col apply(arr2, 2, head, 1) # Margin is col apply(arr2, 2, head, 9) # sorts by row # 3 dimensional arr3 <- array(1:27, dim = c(3,3,3), dimnames = list(paste0("row", 1:3), paste0("col", 1:3), paste0("time", 1:3))) arr3 apply(arr3, 1, head, 1) # Margin is row apply(arr3, 1, head, 27) # sorts by time and col apply(arr3, 2, head, 1) # Margin is col apply(arr3, 2, head, 27) # sorts by time and row apply(arr3, 3, head, 1) # Margin is time apply(arr3, 3, head, 27) # sorts by col and row # 4 dimensional arr4 <- array(1:81, dim = c(3,3,3,3), dimnames = list(paste0("row", 1:3), paste0("col", 1:3), paste0("time", 1:3), paste0("var", 1:3))) arr4 apply(arr4, 1, head, 1) # Margin is row apply(arr4, 1, head, 81) # sorts by var, time, col apply(arr4, 2, head, 1) # Margin is col apply(arr4, 2, head, 81) # sorts by var, time, row apply(arr4, 3, head, 1) # Margin is time apply(arr4, 3, head, 81) # sorts by var, col, row apply(arr4, 4, head, 1) # Margin is var apply(arr4, 4, head, 81) # sorts by time, col, row
Q: There’s no equivalent to
split()
+vapply()
. Should there be? When would it be useful? Implement one yourself.A: We can modify the
tapply2()
approach from the book, wheresplit()
andsapply()
were combined:v_tapply <- function(x, group, f, FUN.VALUE, ..., USE.NAMES = TRUE) { pieces <- split(x, group) vapply(pieces, f, FUN.VALUE, ..., USE.NAMES = TRUE) }
tapply()
has aSIMPLIFY
argument. When you set it toFALSE
,tapply()
will always return a list. It is easy to create cases where the length and the types/classes of the list elements vary depending on the input. Thevapply()
version could be useful, if you want to control the structure of the output to get an error according to some logic of a specific usecase or you want typestable output to build up other functions on top of it.Q: Implement a pure R version of
split()
. (Hint: useunique()
and subsetting.) Can you do it without a for loop?A:
split2 <- function(x, f, drop = FALSE, ...){ # there are three relevant cases for f. f is a character, f is a factor and all # levels occur, f is a factor and some levels don't occur. # first we check if f is a factor fact <- is.factor(f) # if drop it set to TRUE, we drop the non occuring levels. # (If f is a character, this has no effect.) if(drop){f <- f[, drop = TRUE]} # now we want all unique elements/levels of f levs <- if (fact) {unique(levels(f))} else {as.character(unique(f))} # we use these levels to subset x and supply names for the resulting output. setNames(lapply(levs, function(lv) x[f == lv, , drop = FALSE]), levs) }
Q: What other types of input and output are missing? Brainstorm before you look up some answers in the plyr paper.
A: From the suggested plyr paper, we can extract a lot of possible combinations and list them up on a table. Sean C. Anderson already has done this based on a presentation from Hadley Wickham and provided the following result here.
object type array data frame list nothing array apply
.
.
.
data frame .
aggregate
by
.
list sapply
.
lapply
.
n replicates replicate
.
replicate
.
function arguments mapply
.
mapply
.
Note the column nothing, which is specifically for usecases, where sideeffects like plotting or writing data are intended.
17.3.4 Manipulating lists
Q: Why isn’t
is.na()
a predicate function? What base R function is closest to being a predicate version ofis.na()
?A: Because a predicate function always returns
TRUE
orFALSE
.is.na(NULL)
returnslogical(0)
, which excludes it from being a predicate function. The closest in base that we are aware of isanyNA()
, if one applies it elementwise.Q: Use
Filter()
andvapply()
to create a function that applies a summary statistic to every numeric column in a data frame.A:
vapply_num <- function(X, FUN, FUN.VALUE){ vapply(Filter(is.numeric, X), FUN, FUN.VALUE) }
Q: What’s the relationship between
which()
andPosition()
? What’s the relationship betweenwhere()
andFilter()
?A:
which()
returns all indices of true entries from a logical vector.Position()
returns just the first (default) or the last integer index of all true entries that occur by applying a predicate function on a vector. So the default relation isPosition(f, x) <=> min(which(f(x)))
.where()
, defined in the book as:where <- function(f, x) { vapply(x, f, logical(1)) }
is useful to return a logical vector from a condition asked on elements of a list or a data frame.
Filter(f, x)
returns all elements of a list or a data frame, where the supplied predicate function returnsTRUE
. So the relation isFilter(f, x) <=> x[where(f, x)]
.Q: Implement
Any()
, a function that takes a list and a predicate function, and returnsTRUE
if the predicate function returnsTRUE
for any of the inputs. ImplementAll()
similarly.A:
Any()
:Any <- function(l, pred){ stopifnot(is.list(l)) for (i in seq_along(l)){ if (pred(l[[i]])) return(TRUE) } return(FALSE) }
All()
:All <- function(l, pred){ stopifnot(is.list(l)) for (i in seq_along(l)){ if (!pred(l[[i]])) return(FALSE) } return(TRUE) }
Q: Implement the
span()
function from Haskell: given a listx
and a predicate functionf
,span
returns the location of the longest sequential run of elements where the predicate is true. (Hint: you might findrle()
helpful.)A: Our
span_r()
function returns the first index of the longest sequential run of elements where the predicate is true. In case of more than one longest sequenital, more than one first_index is returned.span_r <- function(l, pred){ # We test if l is a list stopifnot(is.list(l)) # we preallocate a logical vector and save the result # of the predicate function applied to each element of the list test <- vector("logical", length(l)) for (i in seq_along(l)){ test[i] <- (pred(l[[i]])) } # we return NA, if the output of pred is always FALSE if(!any(test)) return(NA_integer_) # Otherwise we look at the length encoding of TRUE and FALSE values. rle_test <- rle(test) # Since it might happen, that more than one maximum series of TRUE's appears, # we have to implement some logic, which might be easier, if we save the rle # output in a data.frmame rle_test <- data.frame(lengths = rle_test[["lengths"]], values = rle_test[["values"]], cumsum = cumsum(rle_test[["lengths"]])) rle_test[["first_index"]] <- rle_test[["cumsum"]] - rle_test[["lengths"]] + 1 # In the last line we calculated the first index in the original list for every encoding # In the next line we calculate a column, which gives the maximum # encoding length among all encodings with the value TRUE rle_test[["max"]] <- max(rle_test[rle_test[, "values"] == TRUE, ][,"lengths"]) # Now we just have to subset for maximum length among all TRUE values and return the # according "first index": rle_test[rle_test$lengths == rle_test$max & rle_test$values == TRUE, ]$first_index }
17.3.5 List of functions
Q: Implement a summary function that works like
base::summary()
, but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.Q: Which of the following commands is equivalent to
with(x, f(z))
?x$f(x$z)
.f(x$z)
.x$f(z)
.f(z)
.- It depends.
17.3.6 Mathematical functionals
Q: Implement
arg_max()
. It should take a function and a vector of inputs, and return the elements of the input where the function returns the highest value. For example,arg_max(-10:5, function(x) x ^ 2)
should return -10.arg_max(-5:5, function(x) x ^ 2)
should returnc(-5, 5)
. Also implement the matchingarg_min()
function.A:
arg_max()
:arg_max <- function(x, f){ x[f(x) == max(f(x))] }
arg_min()
:arg_min <- function(x, f){ x[f(x) == min(f(x))] }
Q: Challenge: read about the fixed point algorithm. Complete the exercises using R.
17.3.7 A family of functions
Q: Implement
smaller
andlarger
functions that, given two inputs, return either the smaller or the larger value. Implementna.rm = TRUE
: what should the identity be? (Hint:smaller(x, smaller(NA, NA, na.rm = TRUE), na.rm = TRUE)
must bex
, sosmaller(NA, NA, na.rm = TRUE)
must be bigger than any other value of x.) Usesmaller
andlarger
to implement equivalents ofmin()
,max()
,pmin()
,pmax()
, and new functionsrow_min()
androw_max()
.A: We can do almost everything as shown in the case study in the textbook. First we define the functions
smaller_()
andlarger_()
. We use the underscore suffix, to built up non suffixed versions on top, which will include thena.rm
parameter. In contrast to theadd()
example from the book, we change two things at this step. We won’t include errorchecking, since this is done later at the top level and we returnNA_integer_
if any of the arguments isNA
(this is important, if na.rm is set toFALSE
and wasn’t needed by theadd()
example, since+
already returnsNA
in this case.)smaller_ <- function(x, y){ if(anyNA(c(x, y))){return(NA_integer_)} out <- x if(y < x) {out <- y} out } larger_ <- function(x, y){ if(anyNA(c(x, y))){return(NA_integer_)} out <- x if(y > x) {out <- y} out }
We can take
na.rm()
from the book:rm_na <- function(x, y, identity) { if (is.na(x) && is.na(y)) { identity } else if (is.na(x)) { y } else { x } }
To find the identity value, we can apply the same argument as in the textbook, hence our functions are also associative and the following equation should hold:
3 = smaller(smaller(3, NA), NA) = smaller(3, smaller(NA, NA)) = 3
So the identidy has to be greater than 3. When we generalize from 3 to any real number this means that the identity has to be greater than any number, which leads us to infinity. Hence identity has to be
Inf
forsmaller()
(and-Inf
forlarger()
), which we implement next:smaller <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == 1, length(y) == 1, is.numeric(x) | is.logical(x), is.numeric(y) | is.logical(y)) if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, Inf) else smaller_(x,y) } larger <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == 1, length(y) == 1, is.numeric(x) | is.logical(x), is.numeric(y) | is.logical(y)) if (na.rm && (is.na(x) || is.na(y))) rm_na(x, y, -Inf) else larger_(x,y) }
Like
min()
andmax()
can act on vectors, we can implement this easyly for our new functions. As shown in the book, we also have to set theinit
parameter to the identity value.r_smaller <- function(xs, na.rm = TRUE) { Reduce(function(x, y) smaller(x, y, na.rm = na.rm), xs, init = Inf) } # some tests r_smaller(c(1:3, 4:(-1))) #> [1] -1 r_smaller(NA, na.rm = TRUE) #> [1] Inf r_smaller(numeric()) #> [1] Inf r_larger <- function(xs, na.rm = TRUE) { Reduce(function(x, y) larger(x, y, na.rm = na.rm), xs, init = -Inf) } # some tests r_larger(c(1:3), c(4:1)) #> [1] 3 r_larger(NA, na.rm = TRUE) #> [1] -Inf r_larger(numeric()) #> [1] -Inf
We can also create vectorised versions as shown in the book. We will just show the
smaller()
case to become not too verbose.v_smaller1 <- function(x, y, na.rm = FALSE){ stopifnot(length(x) == length(y), is.numeric(x) | is.logical(x), is.numeric(y)| is.logical(x)) if (length(x) == 0) return(numeric()) simplify2array( Map(function(x, y) smaller(x, y, na.rm = na.rm), x, y) ) } v_smaller2 <- function(x, y, na.rm = FALSE) { stopifnot(length(x) == length(y), is.numeric(x) | is.logical(x), is.numeric(y)| is.logical(x)) vapply(seq_along(x), function(i) smaller(x[i], y[i], na.rm = na.rm), numeric(1)) } # Both versions give the same results v_smaller1(1:10, c(2,1,4,3,6,5,8,7,10,9)) #> [1] 1 1 3 3 5 5 7 7 9 9 v_smaller2(1:10, c(2,1,4,3,6,5,8,7,10,9)) #> [1] 1 1 3 3 5 5 7 7 9 9 v_smaller1(numeric(), numeric()) #> numeric(0) v_smaller2(numeric(), numeric()) #> numeric(0) v_smaller1(c(1, NA), c(1, NA), na.rm = FALSE) #> [1] 1 NA v_smaller2(c(1, NA), c(1, NA), na.rm = FALSE) #> [1] 1 NA v_smaller1(NA,NA) #> [1] NA v_smaller2(NA,NA) #> [1] NA
Of course, we are also able to copy paste the rest from the textbook, to solve the last part of the exercise:
row_min <- function(x, na.rm = FALSE) { apply(x, 1, r_smaller, na.rm = na.rm) } col_min <- function(x, na.rm = FALSE) { apply(x, 2, r_smaller, na.rm = na.rm) } arr_min <- function(x, dim, na.rm = FALSE) { apply(x, dim, r_smaller, na.rm = na.rm) }
Q: Create a table that has and, or, add, multiply, smaller, and larger in the columns and binary operator, reducing variant, vectorised variant, and array variants in the rows.
Fill in the cells with the names of base R functions that perform each of the roles.
Compare the names and arguments of the existing R functions. How consistent are they? How could you improve them?
Complete the matrix by implementing any missing functions.
A In the following table we can see the requested base R functions, that we are aware of:
and or add multiply smaller larger binary &&
||
reducing all
any
sum
prod
min
max
vectorised &
|
+
*
pmin
pmax
array Notice that we were relatively strict about the binary row. Since the vectorised and reducing versions are more general, then the binary versions, we could have used them twice. However, this doesn’t seem to be the intention of this exercise.
The last part of this exercise can be solved via copy pasting from the book and the last exercise for the binary row and creating combinations of
apply()
and the reducing versions for the array row. We think the array functions just need a dimension and anrm.na
argument. We don’t know how we would name them, but sth. likesum_array(1, na.rm = TRUE)
could be ok.The second part of the exercise is hard to solve complete. But in our opinion, there are two important parts. The behaviour for special inputs like
NA
,NaN
,NULL
and zero length atomics should be consistent and all versions should have arm.na
argument, for which the functions also behave consistent. In the follwing table, we return the output of`f`(x, 1)
, wheref
is the function in the first column andx
is the special input in the header (the named functions also have anrm.na
argument, which isFALSE
by default). The order of the arguments is important, because of lazy evaluation.NA
NaN
NULL
logical(0)
integer(0)
&&
NA
NA
error
NA
NA
all
NA
NA
TRUE
TRUE
TRUE
&
NA
NA
error
logical(0)
logical(0)
||
TRUE
TRUE
error
TRUE
TRUE
any
TRUE
TRUE
TRUE
TRUE
TRUE
|
TRUE
TRUE
error
logical(0)
logical(0)
sum
NA
NaN
1
1
1
+
NA
NaN
numeric(0)
numeric(0)
numeric(0)
prod
NA
NaN
1
1
1
*
NA
NaN
numeric(0)
numeric(0)
numeric(0)
min
NA
NaN
1
1
1
pmin
NA
NaN
numeric(0)
numeric(0)
numeric(0)
max
NA
NaN
1
1
1
pmax
NA
NaN
numeric(0)
numeric(0)
numeric(0)
We can see, that the vectorised and reduced numerical functions are all consistent. However it is not, that the first three logical functions return
NA
forNA
andNaN
, while the 4th till 6th function all returnTRUE
. ThenFALSE
would be more consistent for the first three or the return ofNA
for all and an extrana.rm
argument. In seems relatively hard to find an easy rule for all cases and especially the different behaviour forNULL
is relatively confusing. Another good opportunity for sorting the functions would be to differentiate between “numerical” and “logical” operators first and then between binary, reduced and vectorised, like below (we left the last colum, which is redundant, because of coercion, as intended):`f(x,1)`
NA
NaN
NULL
logical(0)
&&
NA
NA
error NA
||
TRUE
TRUE
error TRUE
all
NA
NA
TRUE
TRUE
any
TRUE
TRUE
TRUE
TRUE
&
NA
NA
error logical(0)
|
TRUE
TRUE
error logical(0)
sum
NA
NaN
1 1 prod
NA
NaN
1 1 min
NA
NaN
1 1 max
NA
NaN
1 1 +
NA
NaN
numeric(0)
numeric(0)
*
NA
NaN
numeric(0)
numeric(0)
pmin
NA
NaN
numeric(0)
numeric(0)
pmax
NA
NaN
numeric(0)
numeric(0)
The other point are the naming conventions. We think they are clear, but it could be useful to provide the missing binary operators and name them for example
++
,**
,<>
,><
to be consistent.Q: How does
paste()
fit into this structure? What is the scalar binary function that underliespaste()
? What are thesep
andcollapse
arguments topaste()
equivalent to? Are there anypaste
variants that don’t have existing R implementations?A
paste()
behaves like a mix. If you supply only length one arguments, it will behave like a reducing function, i.e. :paste("a", "b", sep = "") #> [1] "ab" paste("a", "b","", sep = "") #> [1] "ab"
If you supply at least one element with length greater then one, it behaves like a vectorised function, i.e. :
paste(1:3) #> [1] "1" "2" "3" paste(1:3, 1:2) #> [1] "1 1" "2 2" "3 1" paste(1:3, 1:2, 1) #> [1] "1 1 1" "2 2 1" "3 1 1"
We think it should be possible to implement a new
paste()
starting fromp_binary <- function(x, y = "") { stopifnot(length(x) == 1, length(y) == 1) paste0(x,y) }
The
sep
argument is equivalent to bindsep
on every...
input supplied topaste()
, but the last and then bind these results together. In relations:paste(n1, n2, ...,nm , sep = sep) <=> paste0(paste0(n1, sep), paste(n2, n3, ..., nm, sep = sep)) <=> paste0(paste0(n1, sep), paste0(n2, sep), ..., paste0(nn, sep), paste0(nm))
We can check this for scalar and non scalar input
# scalar: paste("a", "b", "c", sep = "_") #> [1] "a_b_c" paste0(paste0("a", "_"), paste("b", "c", sep = "_")) #> [1] "a_b_c" paste0(paste0("a", "_"), paste0("b", "_"), paste0("c")) #> [1] "a_b_c" # non scalar paste(1:2, "b", "c", sep = "_") #> [1] "1_b_c" "2_b_c" paste0(paste0(1:2, "_"), paste("b", "c", sep = "_")) #> [1] "1_b_c" "2_b_c" paste0(paste0(1:2, "_"), paste0("b", "_"), paste0("c")) #> [1] "1_b_c" "2_b_c"
collapse just binds the outputs for non scalar input together with the collapse input. In relations:
for input A1, ..., An, where Ai = a1i:ami, paste(A1 , A2 , ..., An, collapse = collapse) <=> paste0( paste0(paste( a11, a12, ..., a1n), collapse), paste0(paste( a21, a22, ..., a2n), collapse), ................................................. paste0(paste(am-11, am-12, ..., am-1n), collapse), paste( am1, am2, ..., amn) )
One can see this easily by intuition from examples:
paste(1:5, 1:5, 6, sep = "", collapse = "_x_") #> [1] "116_x_226_x_336_x_446_x_556" paste(1,2,3,4, collapse = "_x_") #> [1] "1 2 3 4" paste(1:2,1:2,2:3,3:4, collapse = "_x_") #> [1] "1 1 2 3_x_2 2 3 4"
We think the only paste version that is not implemented in base R is an array version. At least we are not aware of sth. like
row_paste
orpaste_apply
etc.
17.4 Quasiquotation (new)
Q: Why does
as.Date.default()
usesubstitute()
anddeparse()
? Why doespairwise.t.test()
use them? Read the source code.A:
as.Date.default()
uses them to convert unexpected input expressions (neither dates, norNAs
) into a character string and return it within an error message.pairwise.t.test()
uses them to convert the names of its datainputs (response vectorx
and grouping factorg
) into character strings to format these further into a part of the desired output.Q:
pairwise.t.test()
assumes thatdeparse()
always returns a length one character vector. Can you construct an input that violates this expectation? What happens?A: We can pass an expression to one of
pairwise.t.test()
’s data input arguments, which exceeds the default cutoff width indeparse()
. The expression will be split into a character vector of length greater 1. The deparsed data inputs are directly pasted (read the source code!) with “and” as separator and the result is just used to be displayed in the output. Just the data.name output will change (it will include more than one “and”).d=1 pairwise.t.test(2, d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d+d) #> #> Pairwise comparisons using t tests with pooled SD #> #> data: 2 and d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + d + 2 and d #> #> <0 x 0 matrix> #> #> P value adjustment method: holm