Mode estimation in R

Statistics with R Measures of central tendency
Calculate the mode in R for both discrete and continuous data

The mode is a measure of location that can be defined as the most probable outcome of a random variable or as the most frequent value on a set of observations. It is a robust measure that coincides with the mean and the median in symmetric distributions. In this tutorial we will review how to calculate the mode in R for both discrete and continuous one-dimensional variables.

Discrete unimodal estimation

Consider the following vector x:

x <- c(1, 5, 1, 6, 2, 1, 6, 7, 1)

The mode can be calculated as the most repeated value withing the variable, which in this case is 1. A simple way of calculating the mode in R in this case is using the following function:

mode <- function(x) {
   return(as.numeric(names(which.max(table(x)))))
}

In this case, we can check that the mode is 1 passing the vector to the function:

mode(x) # 1

If you want to visualize the number of times that each data point is repeated you can also create a bar chart.

barplot(table(x), col = c(4, rep("gray", 4)))
legend("topright", "Mode", fill = 4)

Most frequent value of the variable

Continous unimodal estimation

If our variable of interest in continuous instead of discrete we cannot use the previous procedure, but we must resort to another method. The most usual procedure in the literature is calculating the maximum of the estimation of the density function of the data making use of any algorithm.

Consider the following normal data (unimodal) with mean 0 and standard deviation of 1. As the normal distribution is symmetric, we know that the mean, the median and the mode are equal (0).

set.seed(1234)
x2 <- rnorm(1000)

In order to visualize the modes you can draw the histogram and the density function estimation. Note that the the selection of the bandwidth will determine the shape of the estimated density.

# Histogram
hist(x2, freq = FALSE)
# Density
dx <- density(x2)
lines(dx$x, dx$y, col = 2, lwd = 2)
# Theoretical mode
abline(v = 0, col = 4, lty = 2, lwd = 3)

Histogram, density and mode of discrete value

In order to perform the calculation you will need to use the mlv function of the modeest package, that allows selection among different algorithms. We recommend you to use the mean-shift algorithm, as displayed on the following block of code.

# install.packages("modeest")
library(modeest)

# Moda
mlv(x2, method = "meanshift") # -0.03912067

We can observe that the estimated mode (-0.039) is very close to the theoretical mode (0). Other available methods are ā€œlientzā€, ā€œnaiveā€, ā€œventerā€, ā€œgrenanderā€, ā€œhsmā€, ā€œparzenā€, ā€œtsybakovā€ and ā€œasselinā€.

Discrete multimodal estimation

Unlike the median or mean, the mode can take multiple values at the same time. For instance, consider the vector y, which has two modes.

y <- c(3, 5, 3, 3, 5, 6, 5)

# Histogram
hist(y)

Bimodal discrete data

In this case the most repeated values are 3 and 5. In order to calculate several modes you can make use of the mlv function of the modeest package and apply the mfv method.

# install.packages("modeest")
library(modeest)

# Modes
mlv(y, method = "mfv") # 3 5 

Continuous multimodal estimation

In you want to calculate several modes when our variable is continuous you can use the locmodes of the multimode package.

Consider the following multimodal data, which theoretical modes are 40 y 120, represented with vertical red lines.

n <- 1000
bin <- rbinom(n, 1, 0.6)
y2 <- rnorm(n, mean = 120, sd = 11) * bin +
      rnorm(n, mean = 40, sd = 5) * (1 - bin)

# Histogram
hist(y2)
# Theoretical mode 1
abline(v = 40, col = 2, lwd = 2) 
# Theoretical mode 2
abline(v = 120, col = 2, lwd = 2)

Multimodal estimation in R

You can apply the locmodes function to the previous data, indicating the number of modes that you expect to find in the mod0 argument.

# install.packages("multimode")
library(multimode)

modes <- locmodes(y2, mod0 = 2)
modes
Estimated location
Modes: 40.56825  120.8625 
Antimode: 69.94661 

Estimated value of the density
Modes: 0.02535653  0.02033563 
Antimode: 8.184294e-08 

Critical bandwidth: 3.746696

Warning message:
In locmodes(y, mod0 = 2) :
  If the density function has an unbounded support, artificial modes may have been created in the tails

On the previous output you can observe that the estimated modes are 40.57 and 120.86, very close to the theoretical values.

The library also provides a S3 method to plot the estimations returned by the locmodes function, displaying the localization of the modes, of the antimodes and the bandwidth used.

plot(modes)

Estimating several modes in R for continuous data

The package also provides the modetest function to test for multimodality and functions for exploring the number of modes, such as modetree, modeforest and sizes.