HOME

apply in R

Data Manipulation in R apply family

The apply family functions in R are a well-known set of R vectorized functions that allows you to perform complex tasks over arrays, avoiding the use of for loops. In this tutorial you will learn how to use apply in R through several examples and use cases.

apply() function in R

The apply command in R allows you to apply a function across an array, matrix or data frame. You can do this in several ways, depending on the value you specify to the MARGIN argument, which is usually set to 1, 2 or c(1, 2).

apply(X,       # Array, matrix or data frame
      MARGIN,  # 1: rows, 2: columns, c(1, 2): rows and columns
      FUN,     # Function to be applied
      ...)     # Additional arguments to FUN

Through this tutorial we are going to use the following example data, so make sure you have it loaded in your workspace. Note that we are going to use a data frame, but it could also be a matrix or an array instead.

# Data frame
df <- data.frame(x = 1:4, y = 5:8, z = 10:13)
df

MARGIN = 1 applies the function to rows while MARGIN = 2 applies the function to columns.

Applying a function to each row

You can apply a function to every row of an array in R setting 1 as parameter of the MARGIN argument. For this first example we are going to apply the sum function over the data frame.

apply(X = df, MARGIN = 1, FUN = sum)

Note that in this function it is usual to not specify the argument names due to the simplicity of the function, but remember the order of the arguments.

# Sum by rows
apply(df, 1, sum)

16 19 22 25

You can also apply at specific indices or cells, subsetting the data you want from your data frame.

# Sum by rows to a subset of data
apply(df[c(1, 2), ], 1, sum)

16 19

Note that the output is a vector containing the corresponding sum of each row.

Applying a function to each column

Setting MARGIN = 2 will apply the function you specify to each column of the array you are working with.

# Sum by columns
apply(df, 2, sum)

 x  y  z
10 26 46

In this case, the output is a vector containing the sum of each column of the sample data frame. You can also use the apply function to specific columns if you subset the data.

# Sum by columns to a subset of data
apply(df[, c(1, 3)], 2, sum)

 x  z
10 46

It should be noticed that this is more efficient than applying the function to all the data frame and then subsetting the output.

The previous examples are for educational purposes. It is more efficient to use colSums and rowSums functions to calculate the sum of columns and rows, respectively.

Apply any function to all R data frame

You can set the MARGIN argument to c(1, 2) or, equivalently, to 1:2 to apply the function to each value of the data frame.

apply(df, c(1, 2), sum)

      x  y   z
[1, ] 1  5  10
[2, ] 2  6  11
[3, ] 3  7  12
[4, ] 4  8  13

If you set MARGIN = c(2, 1) instead of c(1, 2) the output will be the same matrix but transposed.

apply(df, c(2, 1), sum)

  [, 1] [, 2] [, 3] [, 4]
x    1     2     3     4
y    5     6     7     8
z   10    11    12    13

The output is of class “matrix” instead of “data.frame”.

Note that, in this case, the elements of the output are the elements of the data frame itself, as it is calculating the sum of each individual cell.

In the section where we explain how to apply custom functions to the apply function we will show an example where the output of applying by rows and columns is the same as applying the function only by columns and another where the outputs are different in all cases, to better understand the purpose of applying a function over rows and columns at the same time.

Additional arguments of the apply R function

The mean function has an additional argument (na.rm) to specify whether to remove NA values or not. If you need to specify arguments of the function you are applying, you just can pass them separated by commas as follows:

# Apply the mean by rows removing NA values
apply(df, 1, mean, na.rm = TRUE)

Applying a custom function

The function you pass to the FUN argument doesn’t need to be a base R function. You can apply for certain R function even if it is a custom R function. In this example we are going to create a function named fun that calculates the square of a number and convert the output to character if the character argument is set to TRUE.

fun <- function(x, character = FALSE) {
  if (character == FALSE) {
    x ^ 2
  } else {
    as.character(x ^2)
  }
}

First, if you apply the function by rows the output will be a matrix containing the square of the elements by row.

apply(df, 1, fun)

  [, 1] [, 2] [, 3] [, 4]
x    1     4     9    16
y   25    36    49    64
z  100   121   144   169

If you specify character = TRUE, each element of the matrix will be converted to characters.

apply(df, 1, fun, character = TRUE)

      [, 1]  [, 2]  [, 3]  [, 4]
[1, ]   "1"    "4"    "9"   "16"
[2, ]  "25"   "36"   "49"   "64"
[3, ] "100"  "121"  "144"  "169"

Second, if you apply the function by columns, you will obtain the following result, that corresponds to the transposed matrix of that which you obtained when applying the function by rows.

apply(df, 2, fun)

       x   y   z
[1, ]  1  25  100
[2, ]  4  36  121
[3, ]  9  49  144
[4, ] 16  64  169

Last, if you apply the function to each cell, you will obtain the following result:

apply(df, c(1, 2), fun)

       x   y   z
[1, ]  1  25  100
[2, ]  4  36  121
[3, ]  9  49  144
[4, ] 16  64  169

Note that in this case, applying the function by rows and columns is the same as applying the function by columns, because the function is calculating the sum of each individual element, that is the element itself. Next, we will show you an example where the three outputs are different. Consider, for instance, the following function:

f <- function(x) sum(exp(x))

This function calculates the sum of the exponential of a number or vector. So if you apply the function by rows, you will obtain the following:

apply(df, 1, f)

22177.60  60284.96  163871.51  445448.95

As an example, the first element of the output (22177.60) can be obtained with: sum(exp(1) + exp(5) + exp(10)). Now, if you apply the function by columns, the output will be completely different.

apply(df, 2, f)

   x           y            z
84.79102  4629.43310  687068.79094

Finally, if you apply the function by rows and columns, the output will be a matrix containing the exponential of each element.

apply(df, 1:2, f)

            x         y         z
[1, ]  2.718282   148.4132   22026.47
[2, ]  7.389056   403.4288   59874.14
[3, ] 20.085537  1096.6332  162754.79
[4, ] 54.598150  2980.9580  442413.39

More examples of the R apply() function

Below are more examples of using the apply function in R, including one in which a function is applied to a multidimensional array.

apply(df, 2, min)     # Minimum values of by columns

apply(df, 2, range)   # Range (min and max values) by column

apply(df, 1, summary) # Summary for each row

apply(df, 2, summary) # Summary for each column

# Applying the sum function to a multidimensional array
ar <- array(data = 1:18, dim = c(3, 2, 3))
apply(ar, 3, sum)

The output of the last line is the sum of all the components of each element of the array.