The tapply function
tapply function is very similar to the
apply function. In the following block of code we show the function syntax and the simplified description of each argument.
tapply(X, # Object you can split (matrix, data frame, ...) INDEX, # List of factors of the same length FUN, # Function to be applied to factors (or NULL) ..., # Additional arguments to be passed to FUN default = NA, # If simplify = TRUE, is the array initialization value simplify = TRUE) # If set to FALSE returns a list object
Note that the three first arguments are the most usual and that it is common to not specify the arguments name in the apply family functions due to its simple syntax.
How to use tapply in R?
tapply function is very easy to use in R. First, consider the following example dataset, that represents the price of some objects, its type and the store where they were sold.
set.seed(2) data_set <- data.frame(price = round(rnorm(25, sd = 10, mean = 30)), type = sample(1:4, size = 25, replace = TRUE), store = sample(paste("Store", 1:4), size = 25, replace = TRUE)) head(data_set)
price type store 21 2 Store 2 32 3 Store 3 46 4 Store 4 19 3 Store 4 29 1 Store 4 31 3 Store 4
Second, store the values as variables and convert the column named
type to factor.
price <- data_set$price store <- data_set$store type <- factor(data_set$type, labels = c("toy", "food", "electronics", "drinks"))
Finally, you can use the
tapply function to calculate the mean by type of object of the stores as follows:
# Mean price by product type mean_prices <- tapply(price, type, mean) mean_prices
toy food electronics drinks 39.50000 30.33333 32.20000 29.33333
Note that the
tapply arguments must have the same length. You can verify it with the
length function. It also should be noticed that the default output is of class “array”.
class(mean_prices) # "array"
Hence, if needed, you can access each element of the output specifying the desired index in square brackets.
mean_prices # 30.33333
However, you can modify the output class to
list if you set the
simplify argument to
# Mean price by product type mean_prices_list <- tapply(price, type, mean, simplify = FALSE) mean_prices_list
$toy  39.5 $food  30.33333 $electronics  32.2 $drinks  29.33333
In this case, you can access the output elements with the
$ sign and the element name.
mean_prices_list$toy # 39.5
Additional arguments example: Ignore NA
Suppose that your data frame contains some
NA values in its columns.
# Adding a NA values to the data set data_set[1, 1] <- NA data_set[2, 3] <- NA # Mean price by store tapply(data_set$price, data_set$store, mean)
Store 1 Store 2 Store 3 Store 4 32.00000 NA 39.25000 33.14286
tapply function you can specify additional arguments of the function you are applying, after the
FUN argument. In this case, the
mean function allows you to specify the
na.rm argument to remove
NA values. Note that this argument defaults to
tapply(data_set$price, data_set$store, mean, na.rm = TRUE)
Store 1 Store 2 Store 3 Store 4 32.00000 33.50000 39.25000 33.14286
The previous is equivalent to the following:
f <- function(x) mean(x, na.rm = TRUE) tapply(data_set$price, data_set$store, f)
Tapply in R with multiple factors
You can apply the
tapply function to multiple columns (or factor variables) passing them through the
list function. In this example, we are going to apply the
tapply function to the
store factors to calculate the mean price of the objects by type and store.
# Mean price by product type and store tapply(price, list(type, store), mean)
Store 1 Store 2 Store 3 Store 4 toy 46 31.00000 49 36.66667 food 26 30.33333 39 NA electronics 50 29.00000 32 25.00000 drinks 22 40.00000 20 36.00000
Note that as there were no food sold in the Store 4, the corresponding cell returns a
NA value. To override this behavior you can set the
default argument to the value you want, instead of
NA. In this example we decided to set it to 0.
# Mean price by product type and store, changing default argument tapply(price, list(type, store), mean, default = 0)
Store 1 Store 2 Store 3 Store 4 toy 46 31.00000 49 36.66667 food 26 30.33333 39 0.00000 electronics 50 29.00000 32 25.00000 drinks 22 40.00000 20 36.00000