Extract and replace substrings in R with substring() and substr()

Data Manipulation in R String manipulation
The substring() and substr() functions in R

Both substring and substr functions in R allows you to extract or replace parts of a text string. However the argument names differ and while substr requires both the start and end positions of the elements to be extracted or replaced, substring can take only the first position and optionally the last.

Syntax of substring and substr

These functions can extract or replace substrings and have a similar syntax, as both take a string or character vector as input (x or text), the first element to be extracted or replaced (first or start) and the last element to be extracted or replaced (stop or last), but substring provides a default value for last, so its and optional argument.

# To extract values:
substr(x, start, stop)
substring(text, first, last = 1000000L)

# To replace values:
substr(x, start, stop) <- value
substring(text, first, last = 1000000L) <- value

The output of these functions also have differences depending on the length of the input text, as substr returns a character vector of the length of x but substring can return more elements depending on the length of first and last.

Extracting substrings

Consider that you have a string, e.g. "String" and you want to obtain the first three characters. For that purpose you could use substr or substring specifying the desired start and end positions as follows:

# With substr()
substr("String", start = 1, stop = 3)

# With substring()
substring("String", first = 1, last = 3)

However, if you want to extract values from any position to the last character you will need to specify stop with substr but when using substring you don’t need to specify last. Note that you can use the nchar() function to get the length of a string in R.

# With substr()
substr("String", start = 4, stop = nchar("String"))

# With substring() (no need to specify 'last')
substring("String", first = 4)

As stated before, both start and stop are requiered and hence an error will arise if stop is not specified within substr.

substr("String", start = 4) # Error
Error in substr("String", start = 4) : 
  argument "stop" is missing, with no default

Extracting several substrings at once

Other difference between substr and substring is that substr returns a vector of the length of x while the output of substring depends on the length of the specified positions.

# First to second element and second to fourth
substr(c("String", "String"), start = c(1, 2), stop = c(2, 4))
"St" "tri"

The following example returns six elements with substrings with the first element to last, the second element to last and so on.

# From first to last, from second to last, ..., from sixth to last
substring("String", first = 1:6)
"String" "tring"  "ring"   "ing"    "ng"     "g" 

Replacing substrings

The substrings can also be replaced with any other string with the same number of characters of the original substring. To achieve this you will need to store your string on a variable and then assign a value to the substring.

The following is an example using substr:

# Target string
string <- "String"

# Replace from the first to second with "AB"
substr(string, start = 1, stop = 2) <- "AB"

# View result

The sample applies to substring:

# Target string
string <- "String"

# Replace from the third element to last with "1234"
substring(string, first = 3) <- "1234"

# View result

Note that if the number of characters of the replacement is greater than the number of characters of the substring the remaining characters will be ignored.

If you want to remove a substring use the gsub() or sub() functions:

# Target string
string <- "String"

# Remove "Str" from "String"
gsub("Str", "", string)

R version 4.3.2 (2023-10-31 ucrt)