# apply in R

The apply family functions in R are a well-known set of R vectorized functions that allows you to perform complex tasks over arrays, avoiding the use of for loops. In this tutorial you will learn **how to use apply in R** through several examples and use cases.

## apply() function in R

The **apply command in R** allows you to apply a function across an array, matrix or data frame. You can do this in several ways, depending on the value you specify to the `MARGIN`

argument, which is usually set to `1`

, `2`

or `c(1, 2)`

.

```
apply(X, # Array, matrix or data frame
MARGIN, # 1: rows, 2: columns, c(1, 2): rows and columns
FUN, # Function to be applied
...) # Additional arguments to FUN
```

Through this tutorial **we are going to use the following example data**, so make sure you have it loaded in your workspace. Note that we are going to use a data frame, but it could also be a matrix or an array instead.

```
# Data frame
df <- data.frame(x = 1:4, y = 5:8, z = 10:13)
df
```

```
x y z
1 5 10
2 6 11
3 7 12
4 8 13
```

`MARGIN = 1`

applies the function to rows while `MARGIN = 2`

applies the function to columns.

### Applying a function to each row

You can **apply a function to every row** of an array in R setting `1`

as parameter of the `MARGIN`

argument. For this first example we are going to apply the `sum`

function over the data frame.

`apply(X = df, MARGIN = 1, FUN = sum)`

Note that in this function it **is usual to not specify the argument names** due to the simplicity of the function, but remember the order of the arguments.

```
# Sum by rows
apply(df, 1, sum)
```

`16 19 22 25`

You can also apply at specific indices or cells, subsetting the data you want from your data frame.

```
# Sum by rows to a subset of data
apply(df[c(1, 2), ], 1, sum)
```

`16 19`

Note that **the output is a vector** containing the corresponding sum of each row.

### Applying a function to each column

Setting `MARGIN = 2`

will **apply the function** you specify **to each column** of the array you are working with.

```
# Sum by columns
apply(df, 2, sum)
```

```
x y z
10 26 46
```

In this case, the output is a vector containing the sum of each column of the sample data frame. You can also use the **apply function to specific columns** if you subset the data.

```
# Sum by columns to a subset of data
apply(df[, c(1, 3)], 2, sum)
```

```
x z
10 46
```

It should be noticed that this is more efficient than applying the function to all the data frame and then subsetting the output.

The previous examples are for educational purposes. It is more efficient to use `colSums`

and `rowSums`

functions to calculate the sum of columns and rows, respectively.

## Apply any function to all R data frame

You can set the `MARGIN`

argument to `c(1, 2)`

or, equivalently, to `1:2`

to apply the function to each value of the data frame.

`apply(df, c(1, 2), sum)`

```
x y z
[1, ] 1 5 10
[2, ] 2 6 11
[3, ] 3 7 12
[4, ] 4 8 13
```

If you set `MARGIN = c(2, 1)`

instead of `c(1, 2)`

the output will be the same matrix but transposed.

`apply(df, c(2, 1), sum)`

```
[, 1] [, 2] [, 3] [, 4]
x 1 2 3 4
y 5 6 7 8
z 10 11 12 13
```

The output is of class “matrix” instead of “data.frame”.

Note that, **in this case, the elements of the output are the elements of the data frame itself**, as it is calculating the sum of each individual cell.

In the section where we explain how to apply custom functions to the `apply`

function we will show an **example** where the output of applying by rows and columns is the same as applying the function only by columns and another **where the outputs are different in all cases**, to better understand the purpose of applying a function over rows and columns at the same time.

## Additional arguments of the apply R function

The `mean`

function has an additional argument (`na.rm`

) to specify whether to remove `NA`

values or not. If you need to specify arguments of the function you are applying, you just can pass them separated by commas as follows:

```
# Apply the mean by rows removing NA values
apply(df, 1, mean, na.rm = TRUE)
```

## Applying a custom function

The function you pass to the `FUN`

argument doesn’t need to be a base R function. **You can apply** for certain R function even if it is a **custom R function**. In this example we are going to create a function named `fun`

that calculates the square of a number and convert the output to character if the `character`

argument is set to `TRUE`

.

```
fun <- function(x, character = FALSE) {
if (character == FALSE) {
x ^ 2
} else {
as.character(x ^2)
}
}
```

First, if you **apply the function by rows** the **output will be a matrix** containing the square of the elements by row.

`apply(df, 1, fun)`

```
[, 1] [, 2] [, 3] [, 4]
x 1 4 9 16
y 25 36 49 64
z 100 121 144 169
```

If you specify `character = TRUE`

, each element of the matrix will be converted to characters.

`apply(df, 1, fun, character = TRUE)`

```
[, 1] [, 2] [, 3] [, 4]
[1, ] "1" "4" "9" "16"
[2, ] "25" "36" "49" "64"
[3, ] "100" "121" "144" "169"
```

Second, if you **apply the function by columns**, you will obtain the following result, that corresponds to the transposed matrix of that which you obtained when applying the function by rows.

`apply(df, 2, fun)`

```
x y z
[1, ] 1 25 100
[2, ] 4 36 121
[3, ] 9 49 144
[4, ] 16 64 169
```

Last, if you apply the function to each cell, you will obtain the following result:

`apply(df, c(1, 2), fun)`

```
x y z
[1, ] 1 25 100
[2, ] 4 36 121
[3, ] 9 49 144
[4, ] 16 64 169
```

Note that in this case, applying the function by rows and columns is the same as applying the function by columns, because the function is calculating the sum of each individual element, that is the element itself. Next, we will show you **an example where the three outputs are different**. Consider, for instance, the following function:

`f <- function(x) sum(exp(x))`

This function **calculates the sum of the exponential of a number or vector**. So if you **apply the function by rows**, you will obtain the following:

`apply(df, 1, f) `

`22177.60 60284.96 163871.51 445448.95`

As an example, the first element of the output (22177.60) can be obtained with: `sum(exp(1) + exp(5) + exp(10))`

. Now, if you **apply the function by columns**, the output will be completely different.

`apply(df, 2, f) `

```
x y z
84.79102 4629.43310 687068.79094
```

Finally, if you **apply the function by rows and columns**, the output will be a matrix containing the exponential of each element.

`apply(df, 1:2, f)`

```
x y z
[1, ] 2.718282 148.4132 22026.47
[2, ] 7.389056 403.4288 59874.14
[3, ] 20.085537 1096.6332 162754.79
[4, ] 54.598150 2980.9580 442413.39
```

## More examples of the R apply() function

Below are more examples of using the `apply`

function in R, including one in which a function is applied to a multidimensional array.

```
apply(df, 2, min) # Minimum values of by columns
apply(df, 2, range) # Range (min and max values) by column
apply(df, 1, summary) # Summary for each row
apply(df, 2, summary) # Summary for each column
# Applying the sum function to a multidimensional array
ar <- array(data = 1:18, dim = c(3, 2, 3))
apply(ar, 3, sum)
```

The output of the last line is the sum of all the components of each element of the array.