Pattern matching in R with grepl() and grep()
The grepl
and grep
functions allows you to search for pattern coincidences inside a character vector. In this tutorial you will learn their differences and how to use them in several use cases.
Syntax and differences
Both grepl
and grep
search for matches of a pattern inside a character vector. The difference between these functions is that grepl
returns TRUE
or FALSE
depending there is a match for the corresponding element of the vector or not while grep
returns the indices of the elements of the vector that match the specified pattern.
The basic syntax of these functions is the same:
# Returns TRUE/FALSE if pattern matches or not the elements of x
grepl(pattern, x)
# Returns the index of the matching elements of x
grep(pattern, x)
These functions also provide several arguments named ignore.case
, perl
, fixed
, useBytes
and invert
and in addition, grepl
provides an argument named value
. Recall to type ?grepl
or ?grep
for additional information.
The grepl
function
Consider that you want to check if a string contains the letter a
. For this purpose you can input "a"
as pattern and the desired string or vector to x
(the word "sand"
in this case):
grepl(pattern = "a", x = "sand")
TRUE
The previous example returns a TRUE
because there is an "a"
inside "sand"
. Note that you can also search for longer string matches:
grepl("nd", "sand")
TRUE
The function is case sensitive by default, so if you want to match an upper case and the string is in lower case or vice versa the function will return FALSE
.
grepl("S", "sand")
FALSE
If you want to ignore this behavior you will need to set ignore.case = TRUE
.
grepl("S", "sand", ignore.case = TRUE)
TRUE
Note that the function is designed to match character vectors, so if you input a vector to x
the function will return a boolean vector of the same length as x
indicating if there was a match on the corresponding element or not.
grepl("a", c("sand", "sea", "turtle"))
TRUE TRUE FALSE
Using regular expressions
You can also use regular expressions to match anything you want. The following example matches the elements of the vector containing numeric characters. If you want to match the string ‘as is’ set fixed = TRUE
.
grepl("[0-9]", c("one", "2", "three3"))
FALSE TRUE TRUE
Match multiple patterns
Finally, if you want to match multiple patterns you can input the desired patterns separated by |
without spaces. The example below will return TRUE
if the corresponding element contains the letter "d"
or the letter "t"
.
# Match "d" or "t"
grepl("d|t", c("sand", "sea", "turtle"))
TRUE FALSE TRUE
It is possible to create the patterns programmatically pasting the different patterns with the paste() or paste0() functions
# Desired matching patterns
patterns <- c("and", "ea", "w")
patterns <- paste0(patterns, collapse = "|") # "and|ea|w"
# Search for coincidences
grepl(patterns, c("sand", "sea", "turtle"))
TRUE TRUE FALSE
The grep
function
If you want to know the indices of the coincidences of the pattern inside the vector passed to x
you can use grep
. In the example below we check if the string "The"
is inside any of the elements of x
.
grep(pattern = "The", x = c("The sand", "water", "The crab"))
# Equivalent to:
# which(grepl(pattern = "The", x = c("The sand", "water", "The crab")) == TRUE)
1 3
The output is 1 and 3 as the string "The"
is inside the first and the third element of the character vector.
If there are no coincidences, the function will return integer(0)
, as in the example below.
grep(pattern = "x", x = c("The sand", "water", "The crab"))
integer(0)
The function behaves the same as grepl
, so you can also search for multiple patterns separating each with |
.
grep(pattern = "The|crab", x = c("The sand", "water", "The crab"))
1 3
If you need to ignore case you can also set ignore.case = TRUE
. Type ?grep
to get more information about the function and its arguments.
grep(pattern = "b", x = c("A", "B", "C"), ignore.case = TRUE)
2
Finally, if you set value = TRUE
the function will return the matching elements of x
.
grep(pattern = "r", x = c("airplane", "boat", "car"), value = TRUE)
"airplane" "car"
R provides other functions for pattern matching. If you want to know the start position and the length of the match you can use regexpr
, if you want to know the position of every match you can use gregexpr
. Check also regexec
and gregexec
functions for similar outputs.