Home » Data Manipulation » split function in R

# split function in R

## The split() function syntax

The split function divides the input data (x) in different groups (f). The following block summarizes the function arguments and its description.

split(x,                 # Vector or data frame
f,                 # Groups of class factor, vector or list
drop = FALSE,      # Whether to drop unused levels or not
sep = ".",         # Character string to separate groups when f is a list
lex.order = FALSE, # Whether the factor concatenation should be lexically ordered or not
...)               # Additional arguments

### Split vector in R

Suppose you have a named vector, where the name of each element corresponds to the group the element belongs. Hence, you can split the vector in two vectors where the elements are of the same group, passing the names of the vector with the names function to the argument f.

a <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
a
x y x x y
3 5 1 4 3
split(a, f = names(a))
$x x x x 3 1 4$y
y y
5 3

In addition, you can pass a character vector as parameter of the argument f to indicate the corresponding groups of each element, or directly a factor object.

groups <- c("Group 1", "Group 1", "Group 2", "Group 1", "Group 2")

split(a, f = groups)
split(a, f = factor(groups)) # Equivalent
$Group 1 x y x 3 5 4$Group 2
x y
1 3 

Moreover, you can split your data by multiple groups, generating interactions of groups. For that purpose, the input of the argument f must be a list.

groups_2 <- c("Type 1", "Type 1", "Type 1", "Type 2", "Type 1")

split(a, f = list(groups, groups_2))

# Equivalent
f1 <- factor(c("Group 1", "Group 1", "Group 2", "Group 1", "Group 2"),
levels = c("Group 1", "Group 2"))
f2 <- factor(c("Type 1", "Type 1", "Type 1", "Type 2", "Type 1"),
levels = c("Type 1", "Type 2"))

split(a, f = list(f1, f2))
$Group 1.Type 1 x y 3 5$Group 2.Type 1
x y
1 3

$Group 1.Type 2 x 4$Group 2.Type 2
named numeric(0)

Note that, by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them. However, you can customize this with the sep and drop arguments, respectively.

vec_split <-split(a, f = list(f1, f2), drop = TRUE, sep = ": ")
vec_split
$Group 1: Type 1 x y 3 5$Group 2: Type 1
x y
1 3

$Group 1: Type 2 x 4 It should be noted that with the unsplit function you can recover the original vector, but the names will be lost. unsplit(vec_split, list(f1, f2)) <NA> <NA> <NA> <NA> <NA> 3 5 1 4 3  ### Split data frame in R You can split a data set in subsets based on one or more variables that represents groups of the data. Consider the following data frame: set.seed(3) df <- CO2[sample(1:nrow(CO2), 10), ] head(df)  Plant Type Treatment conc uptake 15 Qn3 Quebec nonchilled 95 16.2 68 Mc1 Mississippi chilled 500 19.5 32 Qc2 Quebec chilled 350 38.8 27 Qc1 Quebec chilled 675 35.4 49 Mn1 Mississippi nonchilled 1000 35.5 48 Mn1 Mississippi nonchilled 675 32.4 You can use the split function to split the data frame in groups based for example in the Treatment variable. split(df, f = df$Treatment)
$nonchilled Plant Type Treatment conc uptake 15 Qn3 Quebec nonchilled 95 16.2 49 Mn1 Mississippi nonchilled 1000 35.5 48 Mn1 Mississippi nonchilled 675 32.4 10 Qn2 Quebec nonchilled 250 37.1 44 Mn1 Mississippi nonchilled 175 19.2$chilled
Plant        Type Treatment conc uptake
68   Mc1 Mississippi   chilled  500   19.5
32   Qc2      Quebec   chilled  350   38.8
27   Qc1      Quebec   chilled  675   35.4
23   Qc1      Quebec   chilled  175   24.1
79   Mc3 Mississippi   chilled  175   18.0

As we explained in the vectors section, you can divide a data frame in subsets that meet different combinations of groups at the same time. As an example, you can create the split of the sample data frame with Type and Treatment columns. This will create four subsets with all possible combinations of the groups. Note that the total number of splits is the multiplication of the number of levels of each group.

dfs <- split(df, f = list(df$Type, df$Treatment))
dfs
$Quebec.nonchilled Plant Type Treatment conc uptake 15 Qn3 Quebec nonchilled 95 16.2 10 Qn2 Quebec nonchilled 250 37.1$Mississippi.nonchilled
Plant        Type  Treatment conc uptake
49   Mn1 Mississippi nonchilled 1000   35.5
48   Mn1 Mississippi nonchilled  675   32.4
44   Mn1 Mississippi nonchilled  175   19.2

$Quebec.chilled Plant Type Treatment conc uptake 32 Qc2 Quebec chilled 350 38.8 27 Qc1 Quebec chilled 675 35.4 23 Qc1 Quebec chilled 175 24.1$Mississippi.chilled
Plant        Type Treatment conc uptake
68   Mc1 Mississippi   chilled  500   19.5
79   Mc3 Mississippi   chilled  175   18.0

Remember that you can recover the original data frame with the unsplit function, passing the divided data frame and the group or groups you used to create the split.

unsplit(dfs, f = list(df$Type, df$Treatment))