Filter rows in R with dplyr
The filter function from dplyr subsets rows of a data frame based on a single or multiple conditions. In this tutorial you will learn how to select rows using comparison and logical operators and how to filter by row number with slice.
Sample data
The examples inside this tutorial will use the women data set provided by R. This data set contains two numeric columns: height and weight.
# install.packages("dplyr")
library(dplyr)
# Sample data frame
df <- as_tibble(women)
df
# A tibble: 15 × 2
height weight
<dbl> <dbl>
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 65 135
9 66 139
10 67 142
11 68 146
12 69 150
13 70 154
14 71 159
15 72 164
Filtering rows based on a single condition
The filter function allows to subset rows of a data frame based on a condition. You can filter the values equal to, not equal to, lower than or greater than a value by specifying the desired condition within the function.
The following table contains the comparison operators in R and their descriptions.
| Comparison operator | Description |
|---|---|
| > | Greater than |
| < | Lower than |
| >= | Greater or equal than |
| <= | Lower or equal than |
| == | Equal to |
| != | Not equal to |
For example, if you want to filter the rows where the height column is greater than 68 you can write the following:
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' greater than 68
df_2 <- df %>%
filter(height > 68)
df_2
height weight
<dbl> <dbl>
1 69 150
2 70 154
3 71 159
4 72 164

The filtering can be based on a function. The following example selects the rows of the data frame where the height is equal or lower to the mean of the column.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' lower or equal than the mean
df_2 <- df %>%
filter(height <= mean(height))
df_2
# A tibble: 8 × 2
height weight
<dbl> <dbl>
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 65 135

It is also possible to filter rows using logical operators or functions that return TRUE or FALSE or a combination of them. The most common are shown in the table below.
| Operator/function | Description |
|---|---|
| ! | Logical negation ‘NOT’ |
| %in% | In the set |
| !(x %in% y) | x not in y |
| is.na() | Is NA |
| !is.na() | Is not NA |
| grepl() | Contains a pattern |
| !grepl() | Does not contain a pattern |
Consider that you want to filter the rows in which the height column takes the value 65, 70 and 72. For this you can use the %in% operator and filter the rows by a vector.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' equal to 65, 70 and 72
df_2 <- df %>%
filter(height %in% c(65, 70, 72))
df_2
# A tibble: 3 × 2
height weight
<dbl> <dbl>
1 65 135
2 70 154
3 72 164

The opposite of a condition can be selected with the logical negation operator !. The example below shows how to select the opposite of the filtering made on the previous code.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' NOT equal to 65, 70 and 72
df_2 <- df %>%
filter(!(height %in% c(65, 70, 72)))
df_2
# A tibble: 12 × 2
height weight
<dbl> <dbl>
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
7 64 132
8 66 139
9 67 142
10 68 146
11 69 150
12 71 159

To filter rows containing a specific string you can use grepl or str_detect. The following example filters the rows containing a specific pattern (e.g. rows of height containing a 5).
library(dplyr)
df <- as_tibble(women)
# Rows that contain a 5 inside the values of 'height'
df_2 <- df %>%
filter(grepl("5", height))
df_2
# A tibble: 3 × 2
height weight
<dbl> <dbl>
1 58 115
2 59 117
3 65 135

Multiple conditions
Row filtering can also be based on multiple conditions to filter, for instance, rows where a value is in a specific range or to filter between dates. For this you will need to use logical operators, such as & to specify one AND another condition, and | to specify one OR another condition.
| Logical operator | Description |
|---|---|
| & | Elementwise logical ‘AND’ |
| | | Elementwise logical ‘OR’ |
| xor() | Elementwise exclusive ‘OR’. Equivalent to !(x | y) |
The example below selects rows whose values in the height column are greater than 65 and lower than 68.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' greater than 65 AND lower than to 68
df_2 <- df %>%
filter(height > 65 & height < 68)
df_2
# A tibble: 2 × 2
height weight
<dbl> <dbl>
1 66 139
2 67 142

The multiple conditions can be based on multiple columns. In the following block of code we are selecting the rows whose values in height are greater than 65 and whose values in weight are lower or equal to 150.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' greater than 65 AND weight lower or equal to 150
df_2 <- df %>%
filter(height > 65 & weight <= 150)
df_2
# A tibble: 4 × 2
height weight
<dbl> <dbl>
1 66 139
2 67 142
3 68 146
4 69 150

In case you need to subset rows based on a condition OR on another you can use |. The example below filters the rows whose values in height area greater than 65 or whose values in weight are greater or equal to 150.
library(dplyr)
df <- as_tibble(women)
# Rows with 'height' greater than 65 OR 'weight' greater or equal to 150
df_2 <- df %>%
filter(height > 65 | weight >= 150)
df_2
# A tibble: 7 × 2
height weight
<dbl> <dbl>
1 66 139
2 67 142
3 68 146
4 69 150
5 70 154
6 71 159
7 72 164

Filter by row number with slice
A similar function related to filter is slice, which allows to filter rows based on its index/position. The function takes a sequence or vector of indices (integers) as input, as shown below.
library(dplyr)
df <- as_tibble(women)
# First three rows
df_2 <- df %>%
slice(1:3)
df_2
# A tibble: 3 × 2
height weight
<dbl> <dbl>
1 58 115
2 59 117
3 60 120

In addition, the slice_head function allows to select the first row of the data frame. This function provides an argument named n to select the n first rows.
library(dplyr)
df <- as_tibble(women)
# First row
df_2 <- df %>%
slice_head()
df_2
# A tibble: 1 × 2
height weight
<dbl> <dbl>
1 58 115

Finally, if you need to select the last row you can use slice_tail. This function also provides an argument named n to select the last n rows of the data frame.
library(dplyr)
df <- as_tibble(women)
# Last row
df_2 <- df %>%
slice_tail()
df_2
# A tibble: 1 × 2
height weight
<dbl> <dbl>
1 72 164

slice_sample selects rows randomly and slice_min and slice_max selects the rows with the lowest or highest values of a variable, respectively.