sapply function in R
What is sapply in R? The sapply
function in R is a vectorized function of the apply family that allows you to iterate over a list or vector without the need of using the for loop, that is known to be slow in R. In this tutorial we will show you how to work with the R sapply
function with several examples.
sapply() function
The sapply
function in R applies a function to a vector or list and returns a vector, a matrix or an array. The function has the following syntax:
sapply(X, # Vector, list or expression object
FUN, # Function to be applied
..., # Additional arguments to be passed to FUN
simplify = TRUE, # If FALSE returns a list. If "array" returns an array if possible
USE.NAMES = TRUE) # If TRUE and if X is a character vector, uses the names of X
In the following sections we will review how to use it with several examples
The examples of this tutorial are only for illustrative purposes to understand how to use the sapply
function, as there are better ways to obtain the calculated results.
How to use sapply in R?
In order to use the sapply
function in R you will need to specify the list or vector you want to iterate over on the first argument and the function you want to apply to each element of the vector in the second. Note that you can use a function of any package or a custom function:
sapply(1:4, sqrt)
# 1.000000 1.414214 1.732051 2.000000
# Equivalent to:
sapply(1:4, function(i) sqrt(i))
# Also equivalent to:
my_fun <- function(i) {
sqrt(i)
}
sapply(1:4, my_fun)
Iterate over a vector
Consider, for instance, that you want to calculate the square of the elements of a vector. Using a for loop you will need to type the following code:
out <- numeric(10)
for (i in 1:10) {
out[i] <- i ^ 2
}
out
1 4 9 16 25 36 49 64 81 100
However, with the sapply
function you can just write all in a single line of code in order to obtain the same output:
sapply(1:10, function(i) i ^ 2)
1 4 9 16 25 36 49 64 81 100
Iterate over a list
If you have a list instead of a vector the steps are analogous, but note that the function will be applied to the elements of the list. In the following example we calculate the number of components of each element of the list with the length
function.
List <- list(A = 1:5, B = 6:20, C = 1)
sapply(List, length)
A B C
5 15 1
sapply vs lapply
The difference between lapply
and sapply
functions is that the sapply
function is a wrapper of the lapply
function and it returns a vector, matrix or an array instead of a list.
Consider that you want to calculate the exponential of three numbers. In this case, if you use the sapply
function you will get a vector as output:
sapply(c(3, 5, 7), exp)
20.08554 148.41316 1096.63316
But if you use the lapply
function, you will get a list where each element correspond to the components of the previous vector.
lapply(c(3, 5, 7), exp)
[[1]]
[1] 20.08554
[[2]]
[1] 148.4132
[[3]]
[1] 1096.633
However, on the one hand, if you set the simplify
argument of the sapply
function to FALSE
you will get the same output as the lapply
function. Note that this is the same as using the as.list
function:
sapply(c(3, 5, 7), exp, simplify = FALSE)
as.list(sapply(c(3, 5, 7), exp)) # Equivalent
[[1]]
[1] 20.08554
[[2]]
[1] 148.4132
[[3]]
[1] 1096.633
On the other hand, you can convert the output of the lapply
function to the same type of output of the sapply
function with the simplify2array
or unlist
functions:
simplify2array(lapply(c(3, 5, 7), exp))
unlist(lapply(c(3, 5, 7), exp)) # Equivalent
To sum up, the sapply
and lapply
functions are almost the same, but differ on the output class.
It is more efficient to use the corresponding function instead of transforming the output.
sapply function with additional arguments
The sapply
function in R allows you to pass additional arguments to the function you are applying after the function. Consider the following list with one NA
value:
my_list <- list(A = c(1, 4, 6), B = c(8, NA, 9 , 5))
If you apply the sum
function to each element of the list it will return the sum of the components of each element, but as the second element contains a NA
value the sum also returns NA
.
sapply(my_list, sum)
A B
11 NA
As the sum
function has an additional argument named na.rm
, you can set it to TRUE
as follows to remove NA
values:
sapply(my_list, sum, na.rm = TRUE)
In consequence, the NA
value is not taken into account and the function returns the sum of the finite values.
A B
11 22
It should be noted that if the function you are applying has more additional arguments you can specify them the same way, one after another.
Return a matrix or an array
The output of the sapply
function in R can also be a matrix or an array. On the one hand, if the function you are applying returns vectors of the same length, the sapply
function will output a matrix where the columns are each one of the vectors. On the other hand, if the function returns a matrix, the sapply
function will treat, by default, the matrices as vectors, creating a new matrix, where each column corresponds to the elements of each matrix.
Consider, as an example, that you want to create matrices of three rows and three columns, where all elements have the same number. In order to create one you can type the following:
matrix(1, ncol = 3, nrow = 3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
However, if you try to use the sapply
function to iterate over a list to create more matrices the output won’t be as expected, due to, as we pointed out before, the function treats each matrix as vectors by default.
sapply(1:3, function(i) matrix(i, ncol = 3, nrow = 3))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 1 2 3
[4,] 1 2 3
[5,] 1 2 3
[6,] 1 2 3
[7,] 1 2 3
[8,] 1 2 3
[9,] 1 2 3
In order to solve this issue you can set the simplify
argument to "array"
and consequently each element of the array will contain the desired matrix:
sapply(1:3, function(i) matrix(i, ncol = 3, nrow = 3), simplify = "array")
, , 1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
, , 2
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 2 2 2
[3,] 2 2 2
, , 3
[,1] [,2] [,3]
[1,] 3 3 3
[2,] 3 3 3
[3,] 3 3 3
It is worth to mention that if you set simplify
to FALSE
you can output a list, where each element will contain the corresponding matrix. Note that this is the default behavior of the lapply
function.
sapply(1:3, function(i) matrix(i, ncol = 3, nrow = 3), simplify = FALSE)
[[1]]
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
[[2]]
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 2 2 2
[3,] 2 2 2
[[3]]
[,1] [,2] [,3]
[1,] 3 3 3
[2,] 3 3 3
[3,] 3 3 3
Multiple sapply: Nesting the sapply function
You can nest multiple sapply
functions in R. Suppose that you want to iterate over the columns and rows of a data frame and multiply each element by two. For that purpose, using a for loop you could type:
df <- trees
res <- data.frame()
for(i in 1:ncol(df)) {
for (j in 1:nrow(df)) {
res[j, i] <- df[j, i] * 2
}
}
Nonetheless, using the sapply
function you can avoid loops. Write the following to achieve the same output:
sapply(1:ncol(df), function(i) {
sapply(1:nrow(df), function(j) {
df[j, i] * 2
})
})
This example is only for educational purposes, as you could achieve the same result just with df * 2
.
sapply function example: creating plots
Sometimes the number of lines or plots you want to display depends on something (as the number of variables of a data frame, for instance). In this case, you have to iterate over some list to show the final result. For that purpose you could use a for
loop:
plot(rnorm(10), ylim = c(-6, 6))
nlines <- 5
for (i in 1:nlines) {
lines(-i:i, col = i, lwd = 3)
}
Nevertheless, if you want to avoid using R for loops you can use the sapply
function. Note that as we are applying a graphics function, the sapply
function returns NULL
but the invisible
function will avoid showing the prints of the output.
plot(rnorm(10), ylim = c(-6, 6))
nlines <- 5
invisible(sapply(1:nlines, function(i) lines(-i:i, col = i, lwd = 3)))
This is very useful when creating functions with S3 classes in R packages to draw graphs.