Quantiles in R

Statistics with R Location measures
Quantiles and percentiles in R

Considering a value \(p\), being \(0 < p < 1\) the quantile of order \(p\) is the value that leaves a proportion of the data below (\(p\)) and the rest \((1-p)\) above that value. Notice that quantiles are the generalization of the median which is the quantile for \(p = 0.5\). In R, you can make use of the quantile function to calculate any quantile for any numeric vector.

Syntax

The quantile function calculates the sample quantiles of a numeric vector (x). By default, this function calculates the quartiles specified inside probs, but you can also input any other probabilities to compute any percentile.

quantile(x,             # Numeric vector
         probs = seq(0, 1, 0.25), # Quantiles (By default the quartiles: 0, 0.25, 0.5, 0.75, 1)
         na.rm = FALSE, # If TRUE, removes missing values
         names = TRUE,  # If TRUE, the result keeps name attributes
         type = 7,      # Integer between 1 and 9 to select a quantile algorithm
         digits = 7,    # If names = TRUE, is the number of digits of the percentages
         ...)           # Additional arguments if needed

Quartiles

Quartiles are quantiles of order 0.25, 0.5 and 0.75 and they divide the sample into four parts with the same frequency. Usually, quartiles are denoted by \(Q_1\), \(Q_2\) and \(Q_3\).

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the quartiles of x
quantile(x)
        0%        25%        50%        75%       100% 
-2.2146999 -0.4942425  0.1139092  0.6915454  2.4016178 

Recall that the quartile 0.5 is equal to the median:

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the median of x
median(x) # 0.1139092

Note that you can remove the name attributes from the output setting names = FALSE.

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the quartiles of x
quantile(x, names = FALSE)
-2.2146999 -0.4942425  0.1139092  0.6915454  2.4016178

Remove missing values

If your numeric vector contains missing values you wonā€™t be able to calculate the quantiles, so you will need to set na.rm = TRUE to remove the missing values before the calculation.

# Sample data
set.seed(1)
x <- rnorm(100)

# Missing value
x[1] <- NA

# Calculate the quartiles of x removing missing values
quantile(x, na.rm = TRUE)
        0%        25%        50%        75%       100% 
-2.2146999 -0.4757753  0.1532533  0.6933514  2.4016178 

Quantile algorithms

The calculation of the quantiles are based on one of the nine algorithms discussed in Hyndman and Fan (1996). By default, the seventh algorithm is used, but you can select other passing an integer between 1 and 9 to type. Read the previous reference for further information about each algorithm.

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the quartiles of x using Type 8 algorithm
quantile(x, type = 8)
        0%        25%        50%        75%       100% 
-2.2146999 -0.5156992  0.1139092  0.6939534  2.4016178 

Visual representation

It is important to note that a box plot can be used to visualize quartiles, but the method used inside the boxplot function is not the same as the one used inside quartile, so the output may vary slightly.

# Sample data
set.seed(1)
x <- rnorm(100)

quartile <- quantile(x)

# Box plot
boxplot(x, col = 4, horizontal = TRUE)
text(quartile[2], 1.25, expression(Q[1]))
text(quartile[3], 1.25, expression(Q[2]))
text(quartile[4], 1.25, expression(Q[3]))

Box plot quartiles in R

Deciles

Deciles are quantiles of order 0.1, 0.2, ā€¦, 0.9 and divide the sample into 10 equal-frequency parts. In order to calculate them you can input a sequence from 0 to 1 by 0.1 to probs, as shown in the example below.

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the deciles of x
quantile(x, probs = seq(0, 1, by = 0.1))
         0%         10%         20%         30%         40%         50%         60%         70%         80%         90%        100% 
-2.21469989 -1.05265747 -0.61386923 -0.37534202 -0.07670313  0.11390916  0.37707993  0.58121734  0.77125360  1.18106508  2.40161776

Percentiles

Percentiles are quantiles of the order 0.01, 0.02, ā€¦ , 0.99 and divide the sample into 100 equal-frequency parts. If you want to calculate the percentiles of a numeric vector you will need to specify a sequence from 0 to 1 by 0.01 inside probs.

# Sample data
set.seed(1)
x <- rnorm(100)

# Calculate the percentiles of x
quantile(x, probs = seq(0, 1, by = 0.01))
          0%           1%           2%           3%           4%           5%           6%           7%           8% 
-2.214699887 -1.991605178 -1.808646490 -1.532008555 -1.472864961 -1.381744198 -1.282620249 -1.255240517 -1.226934278