1 Data structures
1.1 Vectors
Q: What are the six types of atomic vector? How does a list differ from an atomic vector?
A: The six types are logical, integer, double, character, complex and raw. The elements of a list don’t have to be of the same type.Q: What makes
is.vector()
andis.numeric()
fundamentally different tois.list()
andis.character()
?
A: The first two tests don’t check for a specific type.Q: Test your knowledge of vector coercion rules by predicting the output of the following uses of
c()
:c(1, FALSE) # will be coerced to numeric -> 1 0 c("a", 1) # will be coerced to character -> "a" "1" c(list(1), "a") # will be coerced to a list with two elements of type double and character c(TRUE, 1L) # will be coerced to integer -> 1 1
Q: Why do you need to use
unlist()
to convert a list to an atomic vector? Why doesn’tas.vector()
work?
A: To get rid of (flatten) the nested structure.Q: Why is
1 == "1"
true? Why is-1 < FALSE
true? Why is"one" < 2
false?
A: These operators are all functions which coerce their arguments (in these cases) to character, double and character. To enlighten the latter case: “one” comes after “2” in ASCII.Q: Why is the default missing value,
NA
, a logical vector? What’s special about logical vectors? (Hint: think aboutc(FALSE, NA_character_)
.)
A: It is a practical thought. When you combineNA
s inc()
with other atomic types they will be coerced likeTRUE
andFALSE
to integer(NA_integer_)
, double(NA_real_)
, complex(NA_complex_)
and character(NA_character_)
. Recall that in R there is a hierarchy of recursion that goes logical -> integer -> double -> character. IfNA
were, for example, a character, includingNA
in a set of integers or logicals would result in them getting coerced to characters which would be undesirable. MakingNA
a logical means that involving anNA
in a dataset (which happens often) will not result in coercion.
1.2 Attributes
Q: An early draft used this code to illustrate
structure()
:structure(1:5, comment = "my attribute") #> [1] 1 2 3 4 5
But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)
A: From the help of comment
(?comment)
:Contrary to other attributes, the comment is not printed (by print or print.default).
Also from the help of attributes
(?attributes)
:Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.
Q: What happens to a factor when you modify its levels?
f1 <- factor(letters) levels(f1) <- rev(levels(f1))
A: Both, the entries of the factor and also its levels are being reversed:
f1 #> [1] z y x w v u t s r q p o n m l k j i h g f e d c b a #> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
Q: What does this code do? How do
f2
andf3
differ fromf1
?f2 <- rev(factor(letters)) # changes only the entries of the factor f3 <- factor(letters, levels = rev(letters)) # changes only the levels of the factor
A: Unlike
f1
f2
andf3
change only one thing. They change the order of the factor or its levels, but not both at the same time.
1.3 Matrices and arrays
Q: What does
dim()
return when applied to a vector?
A:NULL
Q: If
is.matrix(x)
isTRUE
, what willis.array(x)
return?
A:TRUE
, as also documented in?array
:A two-dimensional array is the same thing as a matrix.
Q: How would you describe the following three objects? What makes them different to
1:5
?x1 <- array(1:5, c(1, 1, 5)) # 1 row, 1 column, 5 in third dimension x2 <- array(1:5, c(1, 5, 1)) # 1 row, 5 columns, 1 in third dimension x3 <- array(1:5, c(5, 1, 1)) # 5 rows, 1 column, 1 in third dimension
A: They are of class array and so they have a
dim
attribute.
1.4 Data frames
Q: What attributes does a data frame possess?
A: names, row.names and class.Q: What does
as.matrix()
do when applied to a data frame with columns of different types?
A: From?as.matrix
:The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.
Q: Can you have a data frame with 0 rows? What about 0 columns?
A: Yes, you can create them easily. Also both dimensions can be 0:# here we use the recycling rules for logical subsetting, but you could # also subset with 0, a negative index or a zero length atomic (i.e. # logical(0), character(0), integer(0), double(0)) iris[FALSE,] #> [1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <0 rows> (or 0-length row.names) iris[ , FALSE] # or iris[FALSE] #> data frame with 0 columns and 150 rows iris[FALSE, FALSE] # or just data.frame() #> data frame with 0 columns and 0 rows