# 1 Data structures

## 1.1 Vectors

1. Q: What are the six types of atomic vector? How does a list differ from an atomic vector?
A: The six types are logical, integer, double, character, complex and raw. The elements of a list don’t have to be of the same type.

2. Q: What makes is.vector() and is.numeric() fundamentally different to is.list() and is.character()?
A: The first two tests don’t check for a specific type.

3. Q: Test your knowledge of vector coercion rules by predicting the output of the following uses of c():

c(1, FALSE)      # will be coerced to numeric   -> 1 0
c("a", 1)        # will be coerced to character -> "a" "1"
c(list(1), "a")  # will be coerced to a list with two elements of type double and character
c(TRUE, 1L)      # will be coerced to integer   -> 1 1
4. Q: Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?
A: To get rid of (flatten) the nested structure.

5. Q: Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?
A: These operators are all functions which coerce their arguments (in these cases) to character, double and character. To enlighten the latter case: “one” comes after “2” in ASCII.

6. Q: Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)
A: It is a practical thought. When you combine NAs in c() with other atomic types they will be coerced like TRUE and FALSE to integer (NA_integer_), double (NA_real_), complex (NA_complex_) and character (NA_character_). Recall that in R there is a hierarchy of recursion that goes logical -> integer -> double -> character. If NA were, for example, a character, including NA in a set of integers or logicals would result in them getting coerced to characters which would be undesirable. Making NA a logical means that involving an NA in a dataset (which happens often) will not result in coercion.

## 1.2 Attributes

1. Q: An early draft used this code to illustrate structure():

structure(1:5, comment = "my attribute")
#>  1 2 3 4 5

But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)

A: From the help of comment (?comment):

Contrary to other attributes, the comment is not printed (by print or print.default).

Also from the help of attributes (?attributes):

Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.

2. Q: What happens to a factor when you modify its levels?

f1 <- factor(letters)
levels(f1) <- rev(levels(f1))

A: Both, the entries of the factor and also its levels are being reversed:

f1
#>   z y x w v u t s r q p o n m l k j i h g f e d c b a
#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
3. Q: What does this code do? How do f2 and f3 differ from f1?

f2 <- rev(factor(letters)) # changes only the entries of the factor
f3 <- factor(letters, levels = rev(letters)) # changes only the levels of the factor

A: Unlike f1 f2 and f3 change only one thing. They change the order of the factor or its levels, but not both at the same time.

## 1.3 Matrices and arrays

1. Q: What does dim() return when applied to a vector?
A: NULL

2. Q: If is.matrix(x) is TRUE, what will is.array(x) return?
A: TRUE, as also documented in ?array:

A two-dimensional array is the same thing as a matrix.

3. Q: How would you describe the following three objects? What makes them different to 1:5?

x1 <- array(1:5, c(1, 1, 5)) # 1 row, 1 column, 5 in third dimension
x2 <- array(1:5, c(1, 5, 1)) # 1 row, 5 columns, 1 in third dimension
x3 <- array(1:5, c(5, 1, 1)) # 5 rows, 1 column, 1 in third dimension

A: They are of class array and so they have a dim attribute.

## 1.4 Data frames

1. Q: What attributes does a data frame possess?
A: names, row.names and class.

2. Q: What does as.matrix() do when applied to a data frame with columns of different types?
A: From ?as.matrix:

The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.

3. Q: Can you have a data frame with 0 rows? What about 0 columns?
A: Yes, you can create them easily. Also both dimensions can be 0:

# here we use the recycling rules for logical subsetting, but you could
# also subset with 0, a negative index or a zero length atomic (i.e.
# logical(0), character(0), integer(0), double(0))

iris[FALSE,]
#>  Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species
#> <0 rows> (or 0-length row.names)

iris[ , FALSE] # or iris[FALSE]
#> data frame with 0 columns and 150 rows

iris[FALSE, FALSE] # or just data.frame()
#> data frame with 0 columns and 0 rows