Remove leading and trailing whitespaces in R with trimws()

Data Manipulation in R String manipulation
Remove leading and trailing whitespaces in R with trimws()

Unprocessed data often contains undesired whitespaces, such as line breaks, tabs, or carriage returns at the beginning, end, or both of the characters. In R, there is a function named trimws that simplifies the removal of these leading and trailing whitespaces. In addition, it provides the flexibility to specify other regular expressions, allowing users to remove any unwanted characters.

Syntax of trimws

The trimws function contains three arguments: the input character vector (x), the side to remove (which) and the whitespace regular expression (whitespace):

trimws(x,                                  # Character vector
       which = c("both", "left", "right"), # Which side to remove. Defaults to "both"
       whitespace = "[ \t\r\n]")           # Regular expression to match white spaces

The function returns the input character vector without the specified whitespaces.

Remove leading and trailing spaces or characters

By default, the function will remove both leading or trailing whitespaces of a string, as illustrated below:

# String with leading and trailing spaces
x <- "  text   "

# Remove spaces
x <- trimws(x)
x
"text"

The function can also be applied to a vector or data frame column:

# Data frame with a column with spaces
df <- data.frame(x = 1:3, y = c(" cat ", "  dog ", "    horse "))

# Remove spaces
df$y <- trimws(df$y)
df$y
"cat"   "dog"   "horse"

Keep in mind that the trimws function not only eliminates whitespaces by default, but also targets line breaks (\n), carriage returns (\r), and tabulations (\t):

# String with other type of spaces
x <- " \r text\t\n"

# Remove spaces
x <- trimws(x)
x
"text"

The whitespace argument allows for the specification of characters or regular expressions to be removed. The following example demonstrates how to utilize this argument to remove leading and trailing occurrences of the character "E":

# String with leading and trailing characters
x <- "EEEOther TextEE"

# Remove spaces
x <- trimws(x, whitespace = "E")
x
"Other Text"

As mentioned earlier, itā€™s important to note that the input for whitespace can be a regular expression. In the following example, the trimws function is applied to remove leading and trailing numeric characters (0-9) from the string.

# String with leading and trailing numbers
x <- "2352Text with numbers43213"

# Remove spaces
x <- trimws(x, whitespace = "[0-9]")
x
"Text with numbers"

Remove leading spaces

By default, the trimws function removes both leading and trailing spaces. However, the which argument offers the flexibility to specify whether to remove only trailing or only leading whitespaces. The following example demonstrates how to utilize this argument to exclusively eliminate leading spaces while retaining trailing spaces setting which = "left".

# String
x <- "   this is a text   "

# Remove leading spaces
x <- trimws(x, which = "left")
x
"this is a text   "

Remove trailing spaces

In case you want to remove trailing spaces you will need to set which = "right". This specification indicates that only spaces at the end of the string should be removed, leaving intact any spaces at the beginning.

# String
x <- "   this is a text   "

# Remove trailing spaces
x <- trimws(x, which = "right")
x
"   this is a text"

R version 4.3.2 (2023-10-31 ucrt)