Home » Statistics with R » Variance and standard deviation in R

# Variance and standard deviation in R

## Variance in R with the var function

The variance, denoted by S^2_n, or \sigma^2_n is the arithmetic mean of the square deviations of the values of the variable respect to its mean. This is,

S^2_n = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})^2,

being n the number of observations and \bar{x} the mean of the variable.

The denominator n-1 is used to give an unbiased estimator of the variance for i.i.d. observations.

The variance is always positive and greater values will indicate higher dispersion.

When using R, we can make use of the var function to calculate the variance of a variable. Considering the following sample vector you can calculate its variance with the function:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Variance
var(x) # 38.57143

Note that the function provides an argument named na.rm that can be set to TRUE to remove missing values.

## Standard deviation in R with the sd function

The standard deviation is the positive square root of the variance, this is, S_n = \sqrt{S^2_n}. The standard deviation is more used in Statistics than the variance, as it is expressed in the same units as the variable, while the variance is expressed in square units.

In R, the standard deviation can be calculated making use of the sd function, as shown below:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Standard deviation
sd(x) # 6.21059

# Equivalent to:
sqrt(var(x)) # 6.21059

Similarly, we can calculate the variance as the square of the standard deviation:

# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)

# Variance
sd(x) ^ 2 # 38.57143

The sd function also provides the na.rm argument, that can be set to TRUE if the input vector contains any NA value. Otherwise, the output of the function will be an NA.