# tapply in R

What does `tapply`

mean in R? The `tapply`

function allows you to **create statistical summaries by group based on the levels of one or several factors**. In this tutorial **you will learn how to use tapply in R** in several scenarios with examples.

## The tapply function

The R `tapply`

function is very similar to the `apply`

function. In the following block of code we show the function **syntax and the simplified description of each argument**.

```
tapply(X, # Object you can split (matrix, data frame, ...)
INDEX, # List of factors of the same length
FUN, # Function to be applied to factors (or NULL)
..., # Additional arguments to be passed to FUN
default = NA, # If simplify = TRUE, is the array initialization value
simplify = TRUE) # If set to FALSE returns a list object
```

Note that the **three first arguments are the most usual** and that **it is common to not specify the arguments name** in the apply family functions due to its simple syntax.

## How to use tapply in R?

The `tapply`

function is very easy to use in R. First, **consider the following example dataset**, that represents the price of some objects, its type and the store where they were sold.

```
set.seed(2)
data_set <- data.frame(price = round(rnorm(25, sd = 10, mean = 30)),
type = sample(1:4, size = 25, replace = TRUE),
store = sample(paste("Store", 1:4),
size = 25, replace = TRUE))
head(data_set)
```

```
price type store
21 2 Store 2
32 3 Store 3
46 4 Store 4
19 3 Store 4
29 1 Store 4
31 3 Store 4
```

Second, **store the values as variables** and convert the column named `type`

to factor.

```
price <- data_set$price
store <- data_set$store
type <- factor(data_set$type,
labels = c("toy", "food", "electronics", "drinks"))
```

Finally, you can use the `tapply`

function to calculate the mean by type of object of the stores as follows:

```
# Mean price by product type
mean_prices <- tapply(price, type, mean)
mean_prices
```

```
toy food electronics drinks
39.50000 30.33333 32.20000 29.33333
```

Note that the `tapply`

arguments must have the same length. You can verify it with the `length`

function. It also should be noticed that the default output is of class “array”.

`class(mean_prices) # "array"`

Hence, if needed, you can access each element of the output specifying the desired index in square brackets.

`mean_prices[2] # 30.33333`

However, you can modify the output class to `list`

if you set the `simplify`

argument to `FALSE`

.

```
# Mean price by product type
mean_prices_list <- tapply(price, type, mean, simplify = FALSE)
mean_prices_list
```

```
$toy
[1] 39.5
$food
[1] 30.33333
$electronics
[1] 32.2
$drinks
[1] 29.33333
```

In this case, you can access the output elements with the `$`

sign and the element name.

`mean_prices_list$toy # 39.5`

### Additional arguments example: Ignore NA

Suppose that your data frame contains some `NA`

values in its columns.

```
# Adding a NA values to the data set
data_set[1, 1] <- NA
data_set[2, 3] <- NA
# Mean price by store
tapply(data_set$price, data_set$store, mean)
```

```
Store 1 Store 2 Store 3 Store 4
32.00000 NA 39.25000 33.14286
```

Within the `tapply`

function you can specify additional arguments of the function you are applying, after the `FUN`

argument. In this case, the `mean`

function allows you to specify the `na.rm`

argument to remove `NA`

values. Note that this argument defaults to `FALSE`

.

`tapply(data_set$price, data_set$store, mean, na.rm = TRUE)`

```
Store 1 Store 2 Store 3 Store 4
32.00000 33.50000 39.25000 33.14286
```

The previous is equivalent to the following:

```
f <- function(x) mean(x, na.rm = TRUE)
tapply(data_set$price, data_set$store, f)
```

## Tapply in R with multiple factors

You can apply the `tapply`

function to multiple columns (or factor variables) passing them through the `list`

function. In this example, we are going to apply the `tapply`

function to the `type`

and `store`

factors to calculate the mean price of the objects by type and store.

```
# Mean price by product type and store
tapply(price, list(type, store), mean)
```

```
Store 1 Store 2 Store 3 Store 4
toy 46 31.00000 49 36.66667
food 26 30.33333 39 NA
electronics 50 29.00000 32 25.00000
drinks 22 40.00000 20 36.00000
```

Note that as there were no food sold in the Store 4, the corresponding cell returns a `NA`

value. To override this behavior you can set the `default`

argument to the value you want, instead of `NA`

. In this example we decided to set it to 0.

```
# Mean price by product type and store, changing default argument
tapply(price, list(type, store), mean, default = 0)
```

```
Store 1 Store 2 Store 3 Store 4
toy 46 31.00000 49 36.66667
food 26 30.33333 39 0.00000
electronics 50 29.00000 32 25.00000
drinks 22 40.00000 20 36.00000
```