- 1 What is a factor in R programming?
- 2 The factor function
- 3 Convert character to factor in R
- 4 Convert numeric to factor in R
- 5 Change factor levels
- 6 Difference between levels and labels in R
- 7 Relevel and reorder factor levels
- 8 Convert factor in R to numeric
- 9 Convert factor to string
- 10 Convert factor to date
What is a factor in R programming?
A factor in R is a data structure used to represent a vector as categorical data. Therefore, the factor object takes a bounded number of different values called levels. Factors are very useful when working with character columns of data frames, for creating barplots and creating statistical summaries for categorical variables.
The factor function
factor function allows you to create factors in R. In the following block we show the arguments of the function with a summarized description.
factor(x = character(), # Input vector data levels, # Input of unique x values (optional) labels = levels, # Output labels for the levels (optional) exclude = NA, # Values to be excluded from levels ordered = is.ordered(x), # Whether the input levels are ordered as given or not nmax = NA) # Maximum number of levels
You can get a more detailed description of the function and its arguments calling
Convert character to factor in R
Now we will review an example where our input is a character vector. Suppose, for instance, that you have a vector containing the week days when some event happened. Thus, you can convert your character vector to factor with the
days <- c("Friday", "Tuesday", "Thursday", "Monday", "Wednesday", "Monday", "Wednesday", "Monday", "Monday", "Wednesday", "Sunday", "Saturday") # Levels in alphabetical order my_factor <- factor(days) my_factor
Friday Tuesday Thursday Monday Wednesday Monday Wednesday Monday Monday Wednesday Sunday Saturday Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday
If you want to preserve the order of the levels as appear on the input data, specify in the
levels argument the following:
factor(days, levels = unique(days))
Friday Tuesday Thursday Monday Wednesday Monday Wednesday Monday Monday Wednesday Sunday Saturday Levels: Friday Tuesday Thursday Monday Wednesday Sunday Saturday
Note that you can return and convert the factor levels to character with the
"Friday" "Monday" "Saturday" "Sunday" "Thursday" "Tuesday" "Wednesday"
Convert numeric to factor in R
Suppose you have registered the birth city of six individuals with the following codification:
- 1: Dublin.
- 2: London,
- 3: Sofia.
- 4: Pontevedra.
Hence, you will have something like the following data stored in a numeric vector:
city <- c(3, 2, 1, 4, 3, 2)
Now, you can call the
factor to convert the data into factor and get it categorized for further analysis.
my_factor <- factor(city) my_factor
The output will have the following structure:
3 2 1 4 3 2 Levels: 1 2 3 4
Change factor levels
If the input vector is numeric, as in the previous section, the corresponding label (the city) is not reflected. In order to solve this issue, you can store the data in a factor object using the
factor function and indicate the corresponding labels of the levels in the
labels argument, in order to rename the factor levels.
# Setting the labels in the corresponding order factor_cities <- factor(city, labels = c("Dublin", "London", "Sofia", "Pontevedra")) # Print the result factor_cities
Sofia London Dublin Pontevedra Sofia London Levels: Dublin London Sofia Pontevedra # <- Dublin: 1, London: 2, Sofia: 3, Pontevedra: 4
In the previous code block you can see the final output. As you can observe, now the data is categorized using the cities as labels.
Difference between levels and labels in R
It is common to get confused between labels and levels arguments of the R
factor function. Consider the following vector with a unique group and create a factor from it with default arguments:
gender <- c("female", "female", "female", "female") factor(gender)
female female female female Levels: female
On the one hand, the
labels argument allows you to modify the factor levels names. Hence, the
labels argument it is related to output. Note that the length of the vector passed to the
labels argument must be of the same length of the number of unique groups of the input vector.
factor(gender, labels = c("f"))
f f f f Levels: f
On the other hand, the
levels argument is related to input. This argument allows you to specify how the levels are coded. Moreover, this argument allows you to add new levels to the factor:
factor(gender, levels = c("male", "female"))
female female female female Levels: male female
Note you have to specify at least the same names of the input vector groups, or the output won’t be as expected:
factor(gender, levels = c("male", "f"))
<NA> <NA> <NA> <NA> Levels: male f
Relevel and reorder factor levels
You may be wondering how to change the levels order (which can be important, for instance, in some graphical representations). The factor levels order can be changed in various ways, described in the following subsections.
Custom order of factor levels
In case you want create a custom order for the levels you will have to create a vector with the desired order and pass it to the
# Create a vector with the desired order order <- c("London", "Sofia", "Dublin", "Pontevedra") # Indicate the order in the 'labels' argument factor_cities <- factor(city, labels = order) factor_cities
Dublin Sofia London Pontevedra Dublin Sofia Levels: London Sofia Dublin Pontevedra # <- Ordered as specified
In addition, you can order the levels of the factor alphabetically making use of the
# Alphabetical order factor(city, labels = sort(levels(factor_cities)))
Pontevedra London Dublin Sofia Pontevedra London Levels: Dublin London Pontevedra Sofia # <- Alphabetical order
Reorder factor levels
reorder function is designed to order the levels of a factor based on a statistical measure of other variable. To demonstrate, consider a data frame where each row represents an individual, the ‘city’ column represents the city where it was born and the column ‘salary’ represents its actual annual wage in thousands of dollars.
set.seed(1) df <- data.frame(city = factor_cities, salary = sample(20:50, 6)) df
city salary 1 Sofia 28 2 London 31 3 Dublin 36 4 Pontevedra 45 5 Sofia 25 6 London 43
You can reorder the factor based, for example, on the mean wage of the individuals using the
reorder function as follows:
reorder(df$city, df$salary, mean)
Dublin London Sofia Pontevedra 36.0 37.0 26.5 45.0 Levels: Sofia Dublin London Pontevedra # <- Ordered from lower to higher salary
Reverse order of levels
Recall that you can use the
levels function to obtain the levels of a factor. At this point, the levels of the factor are the following:
"London" "Sofia" "Dublin" "Pontevedra"
With this in mind, you can reverse the order of levels of a factor with the
factor(factor_cities, labels = rev(levels(factor_cities)))
Sofia Dublin Pontevedra London Sofia Dublin Levels: Pontevedra Dublin Sofia London # <- Reversed order
Moreover, if you want to change just one observation and put it first you can use the
relevel function. For example, if you want the level ‘London’ appearing first and maintain the order of the others you can use:
# Setting the level 'London' first factor_cities <- relevel(factor_cities, "London") factor_cities
Sofia London Dublin Pontevedra Sofia London Levels: London Dublin Sofia Pontevedra
In the following sections we will review how to convert factors to other data types in the more efficient way.
Convert factor in R to numeric
If you have a factor in R that you want to convert to numeric, the most efficient way is illustrated in the following block code, using the
levels functions for indexing the levels by the index of the corresponding factor.
my_data <- c(0, 2, 0, 5, 1, 9, 9, 4) my_factor <- factor(my_data) as.numeric(levels(my_factor))[my_factor]
0 2 0 5 1 9 9 4
as.numeric(my_factor), as it will return a numeric vector different than the desired.
Convert factor to string
You may need to convert a factor to string. For that purpose, you can make use of the
my_factor_2 <- factor(c("June", "July", "January", "June")) as.character(my_factor_2)
"June" "July" "January" "June"
Note that if you use the
levels function, the output will return a character vector with the unique strings ordered alphabetically, as we show in one of the previous sections.
"January" "July" "June"
Convert factor to date
Also, if you need to change your factor object to date, you can use the
as.Date function, specifying in the
format argument the date format you are working with.
my_date_factor <- factor(c("03/21/2020", "03/22/2020", "03/23/2020")) as.Date(my_date_factor, format = "%m/%d/%Y")
"2020-03-21" "2020-03-22" "2020-03-23"