lapply function in R

Data Manipulation in R apply family
Learn how to use the lapply function in R programming language

The lapply function is part of the apply family functions in R and allows applying a function over a list or a vector, returning a list. In this tutorial we will review how to use the lapply function in R with several examples.

The lapply() function in R

The lapply function applies a function to a list or a vector, returning a list of the same length as the input. The syntax of the function is as follows:

lapply(X,   # List or vector
       FUN, # Function to be applied
       ...) # Additional arguments to be passed to FUN

How to use lapply in R?

Using the lapply function is very straightforward, you just need to pass the list or vector and specify the function you want to apply to each of its elements.

Iterate over a list

Consider, for instance, the following list with two elements named A and B.

a <- list(A = c(8, 9, 7, 5),
          B = data.frame(x = 1:5, y = c(5, 1, 0, 2, 3)))
a
$A
[1] 8 9 7 5

$B
  x y
1 1 5
2 2 1
3 3 0
4 4 2
5 5 3

If you apply the function sum to the previous list you will obtain the sum of each of its elements (the sum of the elements of the vector and the sum of the elements of the data frame).

lapply(a, sum)
$A
[1] 29

$B
[1] 26

Iterate over a vector

If you have a vector, the lapply function will apply a function to all elements to the vector. As an example, consider the vector b and calculate the square root of each element:

b <- c(12, 18, 6)

lapply(b, sqrt)
[[1]]
[1] 3.464102

[[2]]
[1] 4.242641

[[3]]
[1] 2.44949

If you pass a list to lapply, the corresponding function will be applied to all the elements of the list. If you pass a vector, the function will be applied to each element of the vector.

lapply with multiple arguments

It should be noted that if the function you are passing to the FUN argument has addition arguments you can pass them after the function, using a comma as in the following example, where we set the probs argument of the quantile function:

c <- list(A = c(56, 12, 57, 24), B = c(89, 12, 64, 18, 65, 76))

lapply(c,                           # List
       quantile,                    # Applied function
       probs = c(0.25, 0.5, 0.75))  # Additional argument of the quantile function
$A
  25%   50%   75% 
21.00 40.00 56.25 

$B
  25%   50%   75% 
29.50 64.50 73.25

lapply with a custom function

You can also apply a custom function with lapply. For that purpose you can create a function and pass its name to the FUN argument of just write it inside the lapply function as in the examples of the following block of code.

d <- 1:3

# Function to calculate the second power
fun <- function(x) {
    x ^ 2
}

# Applying our own function
lapply(d, fun)
lapply(d, FUN = function(x) x ^ 2) # Equivalent
lapply(d, function(x) x ^ 2) # Equivalent
[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

lapply vs for loop

The lapply function can be used to avoid for loops, which are known to be slow in R when not used properly. Consider that you want to return a list containing the third power of the even numbers of a vector and the the fourth power of the odd numbers of that vector. In that case you could type:

# Empty list with 5 elements
x <- vector("list", 5)

# Vector
vec <- 1:5

for(i in vec) {
    if(i %% 2 == 0) { # Check if the element 'i' is even or odd
        x[[i]] <- i ^ 3
    } else {
        x[[i]] <- i ^ 4
    }
}
x

An alternative is to use the lapply function as follows:

fun <- function(i) {
   if(i %% 2 == 0) {
        i ^ 3
   } else {
        i ^ 4
    }
}

lapply(vec, fun)

The output in both cases will be the same:

[[1]]
[1] 1   # <- Fourth power of 1

[[2]]
[1] 8   # <- Third power of 2

[[3]]
[1] 81  # <- Fourth power of 3

[[4]]
[1] 64  # <- Third power of 4

[[5]]
[1] 625 # <- Fourth power of 5

You will only be able to use the lapply function instead of a for loop if you want to return a list of the same length as the vector or list you want to iterate with.

lapply vs sapply in R

The lapply and sapply functions are very similar, as the first is a wrapper of the second. The main difference between the functions is that lapply returns a list instead of an array. However, if you set simplify = FALSE to the sapply function both will return a list.

To clarify, if you apply the sqrt function to a vector with the lapply function you will get a list of the same length of the input vector, where each element of the list is the square root of each element of the vector:

lapply(c(4, 9, 16), FUN = sqrt)
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4

However, if you use the sapply function instead, you will get the same output, but return a vector.

sapply(c(4, 9, 16), FUN = sqrt)
2 3 4

Note that you can also return a list as output with the sapply function, setting the argument simplify as FALSE or wrapping it with the as.list function.

sapply(c(4, 9, 16), FUN = sqrt, simplify = FALSE)
as.list(sapply(c(4, 9, 16), sqrt)) # Equivalent
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4

Analogous to the previous, you can return a vector with the lapply function using the unlist or simplify2array functions as follows:

unlist(lapply(c(4, 9, 16), sqrt)) 
simplify2array(lapply(c(4, 9, 16), sqrt)) # Equivalent

More lapply examples

Using lapply on certain columns of an R data frame

Consider that you have a data frame and you want to multiply the elements of the first column by one, the elements of the second by two and so on.

On the one hand, for all columns you could write:

df <- data.frame(x = c(6, 2), y = c(3, 6), z = c(2, 3))

# Function applied to all columns
lapply(1:ncol(df), function(i) df[, i] * i)
[[1]]
[1] 6 2

[[2]]
[1]  6 12

[[3]]
[1] 6 9

On the other hand, If you want to use the lapply function to certain columns of the data frame you could type:

# Function applied to the first and third columns
lapply(c(1, 3), function(i) df[, i] * i)
[[1]]
[1] 6 2

[[2]]
[1] 6 9

Nested lapply functions

If needed, you can nest multiply lapply functions. Consider that you want to iterate over the columns and rows of a data frame and apply a function to each cell. For that purpose, and supposing that you want to multiply each cell by four, you could type something like the following:

df <- data.frame(x = c(6, 2), y = c(3, 6))

# Empty list
res <- vector("list", 2)

for(i in 1:ncol(df)) {
    for (j in 1:nrow(df)) {
        res[[j]][i] <- df[j, i] * 4
    }
}

res
[[1]]        # <- First row by four
[1] 24 12

[[2]]        # <- Second row by four
[1]  8 24

You can get the same values nesting two lapply functions, applying a lapply inside the FUN argument of the first:

lapply(1:ncol(df), function(i) {
       unlist(lapply(1:nrow(df), function(j) {
              df[j, i] * 4
       }))
})

As you may have noticed, this example is just for educational purposes, as you could simply type df * 4 to achieve the same values as the output.