apply in R
The apply family functions in R are a well-known set of R vectorized functions that allows you to perform complex tasks over arrays, avoiding the use of for loops. In this tutorial you will learn how to use apply in R through several examples and use cases.
apply() function in R
The apply command in R allows you to apply a function across an array, matrix or data frame. You can do this in several ways, depending on the value you specify to the MARGIN
argument, which is usually set to 1
, 2
or c(1, 2)
.
apply(X, # Array, matrix or data frame
MARGIN, # 1: rows, 2: columns, c(1, 2): rows and columns
FUN, # Function to be applied
...) # Additional arguments to FUN
Through this tutorial we are going to use the following example data, so make sure you have it loaded in your workspace. Note that we are going to use a data frame, but it could also be a matrix or an array instead.
# Data frame
df <- data.frame(x = 1:4, y = 5:8, z = 10:13)
df
x y z
1 5 10
2 6 11
3 7 12
4 8 13
MARGIN = 1
applies the function to rows while MARGIN = 2
applies the function to columns.
Applying a function to each row
You can apply a function to every row of an array in R setting 1
as parameter of the MARGIN
argument. For this first example we are going to apply the sum
function over the data frame.
apply(X = df, MARGIN = 1, FUN = sum)
Note that in this function it is usual to not specify the argument names due to the simplicity of the function, but remember the order of the arguments.
# Sum by rows
apply(df, 1, sum)
16 19 22 25
You can also apply at specific indices or cells, subsetting the data you want from your data frame.
# Sum by rows to a subset of data
apply(df[c(1, 2), ], 1, sum)
16 19
Note that the output is a vector containing the corresponding sum of each row.
Applying a function to each column
Setting MARGIN = 2
will apply the function you specify to each column of the array you are working with.
# Sum by columns
apply(df, 2, sum)
x y z
10 26 46
In this case, the output is a vector containing the sum of each column of the sample data frame. You can also use the apply function to specific columns if you subset the data.
# Sum by columns to a subset of data
apply(df[, c(1, 3)], 2, sum)
x z
10 46
It should be noticed that this is more efficient than applying the function to all the data frame and then subsetting the output.
The previous examples are for educational purposes. It is more efficient to use colSums
and rowSums
functions to calculate the sum of columns and rows, respectively.
Apply any function to all R data frame
You can set the MARGIN
argument to c(1, 2)
or, equivalently, to 1:2
to apply the function to each value of the data frame.
apply(df, c(1, 2), sum)
x y z
[1, ] 1 5 10
[2, ] 2 6 11
[3, ] 3 7 12
[4, ] 4 8 13
If you set MARGIN = c(2, 1)
instead of c(1, 2)
the output will be the same matrix but transposed.
apply(df, c(2, 1), sum)
[, 1] [, 2] [, 3] [, 4]
x 1 2 3 4
y 5 6 7 8
z 10 11 12 13
The output is of class “matrix” instead of “data.frame”.
Note that, in this case, the elements of the output are the elements of the data frame itself, as it is calculating the sum of each individual cell.
In the section where we explain how to apply custom functions to the apply
function we will show an example where the output of applying by rows and columns is the same as applying the function only by columns and another where the outputs are different in all cases, to better understand the purpose of applying a function over rows and columns at the same time.
Additional arguments of the apply R function
The mean
function has an additional argument (na.rm
) to specify whether to remove NA
values or not. If you need to specify arguments of the function you are applying, you just can pass them separated by commas as follows:
# Apply the mean by rows removing NA values
apply(df, 1, mean, na.rm = TRUE)
Applying a custom function
The function you pass to the FUN
argument doesn’t need to be a base R function. You can apply for certain R function even if it is a custom R function. In this example we are going to create a function named fun
that calculates the square of a number and convert the output to character if the character
argument is set to TRUE
.
fun <- function(x, character = FALSE) {
if (character == FALSE) {
x ^ 2
} else {
as.character(x ^2)
}
}
First, if you apply the function by rows the output will be a matrix containing the square of the elements by row.
apply(df, 1, fun)
[, 1] [, 2] [, 3] [, 4]
x 1 4 9 16
y 25 36 49 64
z 100 121 144 169
If you specify character = TRUE
, each element of the matrix will be converted to characters.
apply(df, 1, fun, character = TRUE)
[, 1] [, 2] [, 3] [, 4]
[1, ] "1" "4" "9" "16"
[2, ] "25" "36" "49" "64"
[3, ] "100" "121" "144" "169"
Second, if you apply the function by columns, you will obtain the following result, that corresponds to the transposed matrix of that which you obtained when applying the function by rows.
apply(df, 2, fun)
x y z
[1, ] 1 25 100
[2, ] 4 36 121
[3, ] 9 49 144
[4, ] 16 64 169
Last, if you apply the function to each cell, you will obtain the following result:
apply(df, c(1, 2), fun)
x y z
[1, ] 1 25 100
[2, ] 4 36 121
[3, ] 9 49 144
[4, ] 16 64 169
Note that in this case, applying the function by rows and columns is the same as applying the function by columns, because the function is calculating the sum of each individual element, that is the element itself. Next, we will show you an example where the three outputs are different. Consider, for instance, the following function:
f <- function(x) sum(exp(x))
This function calculates the sum of the exponential of a number or vector. So if you apply the function by rows, you will obtain the following:
apply(df, 1, f)
22177.60 60284.96 163871.51 445448.95
As an example, the first element of the output (22177.60) can be obtained with: sum(exp(1) + exp(5) + exp(10))
. Now, if you apply the function by columns, the output will be completely different.
apply(df, 2, f)
x y z
84.79102 4629.43310 687068.79094
Finally, if you apply the function by rows and columns, the output will be a matrix containing the exponential of each element.
apply(df, 1:2, f)
x y z
[1, ] 2.718282 148.4132 22026.47
[2, ] 7.389056 403.4288 59874.14
[3, ] 20.085537 1096.6332 162754.79
[4, ] 54.598150 2980.9580 442413.39
More examples of the R apply() function
Below are more examples of using the apply
function in R, including one in which a function is applied to a multidimensional array.
apply(df, 2, min) # Minimum values of by columns
apply(df, 2, range) # Range (min and max values) by column
apply(df, 1, summary) # Summary for each row
apply(df, 2, summary) # Summary for each column
# Applying the sum function to a multidimensional array
ar <- array(data = 1:18, dim = c(3, 2, 3))
apply(ar, 3, sum)
The output of the last line is the sum of all the components of each element of the array.