lapply function in R
The lapply
function is part of the apply family functions in R and allows applying a function over a list or a vector, returning a list. In this tutorial we will review how to use the lapply function in R with several examples.
The lapply() function in R
The lapply
function applies a function to a list or a vector, returning a list of the same length as the input. The syntax of the function is as follows:
lapply(X, # List or vector
FUN, # Function to be applied
...) # Additional arguments to be passed to FUN
How to use lapply in R?
Using the lapply
function is very straightforward, you just need to pass the list or vector and specify the function you want to apply to each of its elements.
Iterate over a list
Consider, for instance, the following list with two elements named A
and B
.
a <- list(A = c(8, 9, 7, 5),
B = data.frame(x = 1:5, y = c(5, 1, 0, 2, 3)))
a
$A
[1] 8 9 7 5
$B
x y
1 1 5
2 2 1
3 3 0
4 4 2
5 5 3
If you apply the function sum
to the previous list you will obtain the sum of each of its elements (the sum of the elements of the vector and the sum of the elements of the data frame).
lapply(a, sum)
$A
[1] 29
$B
[1] 26
Iterate over a vector
If you have a vector, the lapply
function will apply a function to all elements to the vector. As an example, consider the vector b
and calculate the square root of each element:
b <- c(12, 18, 6)
lapply(b, sqrt)
[[1]]
[1] 3.464102
[[2]]
[1] 4.242641
[[3]]
[1] 2.44949
If you pass a list to lapply
, the corresponding function will be applied to all the elements of the list. If you pass a vector, the function will be applied to each element of the vector.
lapply with multiple arguments
It should be noted that if the function you are passing to the FUN
argument has addition arguments you can pass them after the function, using a comma as in the following example, where we set the probs
argument of the quantile
function:
c <- list(A = c(56, 12, 57, 24), B = c(89, 12, 64, 18, 65, 76))
lapply(c, # List
quantile, # Applied function
probs = c(0.25, 0.5, 0.75)) # Additional argument of the quantile function
$A
25% 50% 75%
21.00 40.00 56.25
$B
25% 50% 75%
29.50 64.50 73.25
lapply with a custom function
You can also apply a custom function with lapply
. For that purpose you can create a function and pass its name to the FUN
argument of just write it inside the lapply
function as in the examples of the following block of code.
d <- 1:3
# Function to calculate the second power
fun <- function(x) {
x ^ 2
}
# Applying our own function
lapply(d, fun)
lapply(d, FUN = function(x) x ^ 2) # Equivalent
lapply(d, function(x) x ^ 2) # Equivalent
[[1]]
[1] 1
[[2]]
[1] 4
[[3]]
[1] 9
lapply vs for loop
The lapply
function can be used to avoid for loops, which are known to be slow in R when not used properly. Consider that you want to return a list containing the third power of the even numbers of a vector and the the fourth power of the odd numbers of that vector. In that case you could type:
# Empty list with 5 elements
x <- vector("list", 5)
# Vector
vec <- 1:5
for(i in vec) {
if(i %% 2 == 0) { # Check if the element 'i' is even or odd
x[[i]] <- i ^ 3
} else {
x[[i]] <- i ^ 4
}
}
x
An alternative is to use the lapply
function as follows:
fun <- function(i) {
if(i %% 2 == 0) {
i ^ 3
} else {
i ^ 4
}
}
lapply(vec, fun)
The output in both cases will be the same:
[[1]]
[1] 1 # <- Fourth power of 1
[[2]]
[1] 8 # <- Third power of 2
[[3]]
[1] 81 # <- Fourth power of 3
[[4]]
[1] 64 # <- Third power of 4
[[5]]
[1] 625 # <- Fourth power of 5
You will only be able to use the lapply
function instead of a for loop if you want to return a list of the same length as the vector or list you want to iterate with.
lapply vs sapply in R
The lapply
and sapply
functions are very similar, as the first is a wrapper of the second. The main difference between the functions is that lapply
returns a list instead of an array. However, if you set simplify = FALSE
to the sapply
function both will return a list.
To clarify, if you apply the sqrt
function to a vector with the lapply
function you will get a list of the same length of the input vector, where each element of the list is the square root of each element of the vector:
lapply(c(4, 9, 16), FUN = sqrt)
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
However, if you use the sapply
function instead, you will get the same output, but return a vector.
sapply(c(4, 9, 16), FUN = sqrt)
2 3 4
Note that you can also return a list as output with the sapply
function, setting the argument simplify
as FALSE
or wrapping it with the as.list
function.
sapply(c(4, 9, 16), FUN = sqrt, simplify = FALSE)
as.list(sapply(c(4, 9, 16), sqrt)) # Equivalent
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
Analogous to the previous, you can return a vector with the lapply
function using the unlist
or simplify2array
functions as follows:
unlist(lapply(c(4, 9, 16), sqrt))
simplify2array(lapply(c(4, 9, 16), sqrt)) # Equivalent
More lapply examples
Using lapply on certain columns of an R data frame
Consider that you have a data frame and you want to multiply the elements of the first column by one, the elements of the second by two and so on.
On the one hand, for all columns you could write:
df <- data.frame(x = c(6, 2), y = c(3, 6), z = c(2, 3))
# Function applied to all columns
lapply(1:ncol(df), function(i) df[, i] * i)
[[1]]
[1] 6 2
[[2]]
[1] 6 12
[[3]]
[1] 6 9
On the other hand, If you want to use the lapply
function to certain columns of the data frame you could type:
# Function applied to the first and third columns
lapply(c(1, 3), function(i) df[, i] * i)
[[1]]
[1] 6 2
[[2]]
[1] 6 9
Nested lapply functions
If needed, you can nest multiply lapply
functions. Consider that you want to iterate over the columns and rows of a data frame and apply a function to each cell. For that purpose, and supposing that you want to multiply each cell by four, you could type something like the following:
df <- data.frame(x = c(6, 2), y = c(3, 6))
# Empty list
res <- vector("list", 2)
for(i in 1:ncol(df)) {
for (j in 1:nrow(df)) {
res[[j]][i] <- df[j, i] * 4
}
}
res
[[1]] # <- First row by four
[1] 24 12
[[2]] # <- Second row by four
[1] 8 24
You can get the same values nesting two lapply
functions, applying a lapply
inside the FUN
argument of the first:
lapply(1:ncol(df), function(i) {
unlist(lapply(1:nrow(df), function(j) {
df[j, i] * 4
}))
})
As you may have noticed, this example is just for educational purposes, as you could simply type df * 4
to achieve the same values as the output.