HOME

Variance and standard deviation in R

Statistics with R Dispersion measures

Standard deviation and variance in R with the sd and var functions

The variance and the standard deviation are dispersion measures that quantify the grade of variability, spread or scatter of a variable. Along with measures of central tendency, statistical dispersion measures are used to describe the properties a distribution. In this tutorial you will learn how to calculate the variance and the standard deviation in R with the sd and var functions.

Variance in R with the var function

The variance, denoted by \(S^2_n\), or \(\sigma^2_n\) is the arithmetic mean of the square deviations of the values of the variable respect to its mean. This is,

\(S^2_n = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})^2\),

being \(n\) the number of observations and \(\bar{x}\) the mean of the variable.

The denominator n-1 is used to give an unbiased estimator of the variance for i.i.d. observations.

The variance is always positive and greater values will indicate higher dispersion.

When using R, we can make use of the var function to calculate the variance of a variable. Considering the following sample vector you can calculate its variance with the function:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Variance
var(x) # 38.57143

Note that the function provides an argument named na.rm that can be set to TRUE to remove missing values.

Standard deviation in R with the sd function

The standard deviation is the positive square root of the variance, this is, \(S_n = \sqrt{S^2_n}\). The standard deviation is more used in Statistics than the variance, as it is expressed in the same units as the variable, while the variance is expressed in square units.

In R, the standard deviation can be calculated making use of the sd function, as shown below:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Standard deviation
sd(x) # 6.21059

# Equivalent to:
sqrt(var(x)) # 6.21059

Similarly, we can calculate the variance as the square of the standard deviation:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Variance
sd(x) ^ 2 # 38.57143

The sd function also provides the na.rm argument, that can be set to TRUE if the input vector contains any NA value. Otherwise, the output of the function will be an NA.