split function in R
The split function allows dividing data in groups based on factor levels. In this tutorial we are going to show you how to split in R with different examples, reviewing all the arguments of the function.
The split() function syntax
The split
function divides the input data (x
) in different groups (f
). The following block summarizes the function arguments and its description.
split(x, # Vector or data frame
f, # Groups of class factor, vector or list
drop = FALSE, # Whether to drop unused levels or not
sep = ".", # Character string to separate groups when f is a list
lex.order = FALSE, # Whether the factor concatenation should be lexically ordered or not
...) # Additional arguments
Split vector in R
Suppose you have a named vector, where the name of each element corresponds to the group the element belongs. Hence, you can split the vector in two vectors where the elements are of the same group, passing the names of the vector with the names
function to the argument f
.
a <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
a
x y x x y
3 5 1 4 3
split(a, f = names(a))
$`x`
x x x
3 1 4
$y
y y
5 3
In addition, you can pass a character vector as parameter of the argument f
to indicate the corresponding groups of each element, or directly a factor object.
groups <- c("Group 1", "Group 1", "Group 2", "Group 1", "Group 2")
split(a, f = groups)
split(a, f = factor(groups)) # Equivalent
$`Group 1`
x y x
3 5 4
$`Group 2`
x y
1 3
Moreover, you can split your data by multiple groups, generating interactions of groups. For that purpose, the input of the argument f
must be a list.
# New group
groups_2 <- c("Type 1", "Type 1", "Type 1", "Type 2", "Type 1")
# Split "a" by two groups
split(a, f = list(groups, groups_2))
# Equivalent to:
f1 <- factor(c("Group 1", "Group 1", "Group 2", "Group 1", "Group 2"),
levels = c("Group 1", "Group 2"))
f2 <- factor(c("Type 1", "Type 1", "Type 1", "Type 2", "Type 1"),
levels = c("Type 1", "Type 2"))
split(a, f = list(f1, f2))
$`Group 1.Type 1`
x y
3 5
$`Group 2.Type 1`
x y
1 3
$`Group 1.Type 2`
x
4
$`Group 2.Type 2`
named numeric(0)
Note that, by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them. However, you can customize this with the sep
and drop
arguments, respectively.
# Remove the empty elements and change the separator
vec_split <-split(a, f = list(f1, f2), drop = TRUE, sep = ": ")
vec_split
$`Group 1: Type 1`
x y
3 5
$`Group 2: Type 1`
x y
1 3
$`Group 1: Type 2`
x
4
It should be noted that with the unsplit
function you can recover the original vector, but the names will be lost.
unsplit(vec_split, list(f1, f2))
<NA> <NA> <NA> <NA> <NA>
3 5 1 4 3
Split data frame in R
You can split a data set in subsets based on one or more variables that represents groups of the data. Consider the following data frame:
set.seed(3)
df <- CO2[sample(1:nrow(CO2), 10), ]
head(df)
Plant Type Treatment conc uptake
15 Qn3 Quebec nonchilled 95 16.2
68 Mc1 Mississippi chilled 500 19.5
32 Qc2 Quebec chilled 350 38.8
27 Qc1 Quebec chilled 675 35.4
49 Mn1 Mississippi nonchilled 1000 35.5
48 Mn1 Mississippi nonchilled 675 32.4
You can use the split
function to split the data frame in groups based for example in the Treatment
variable.
split(df, f = df$Treatment)
$`nonchilled`
Plant Type Treatment conc uptake
15 Qn3 Quebec nonchilled 95 16.2
49 Mn1 Mississippi nonchilled 1000 35.5
48 Mn1 Mississippi nonchilled 675 32.4
10 Qn2 Quebec nonchilled 250 37.1
44 Mn1 Mississippi nonchilled 175 19.2
$chilled
Plant Type Treatment conc uptake
68 Mc1 Mississippi chilled 500 19.5
32 Qc2 Quebec chilled 350 38.8
27 Qc1 Quebec chilled 675 35.4
23 Qc1 Quebec chilled 175 24.1
79 Mc3 Mississippi chilled 175 18.0
As we explained in the vectors section, you can divide a data frame in subsets that meet different combinations of groups at the same time. As an example, you can create the split of the sample data frame with Type
and Treatment
columns. This will create four subsets with all possible combinations of the groups. Note that the total number of splits is the multiplication of the number of levels of each group.
dfs <- split(df, f = list(df$Type, df$Treatment))
dfs
$`Quebec.nonchilled`
Plant Type Treatment conc uptake
15 Qn3 Quebec nonchilled 95 16.2
10 Qn2 Quebec nonchilled 250 37.1
$Mississippi.nonchilled
Plant Type Treatment conc uptake
49 Mn1 Mississippi nonchilled 1000 35.5
48 Mn1 Mississippi nonchilled 675 32.4
44 Mn1 Mississippi nonchilled 175 19.2
$Quebec.chilled
Plant Type Treatment conc uptake
32 Qc2 Quebec chilled 350 38.8
27 Qc1 Quebec chilled 675 35.4
23 Qc1 Quebec chilled 175 24.1
$Mississippi.chilled
Plant Type Treatment conc uptake
68 Mc1 Mississippi chilled 500 19.5
79 Mc3 Mississippi chilled 175 18.0
Remember that you can recover the original data frame with the unsplit
function, passing the divided data frame and the group or groups you used to create the split.
unsplit(dfs, f = list(df$Type, df$Treatment))