Variance and standard deviation in R
The variance and the standard deviation are dispersion measures that quantify the grade of variability, spread or scatter of a variable. Along with measures of central tendency, statistical dispersion measures are used to describe the properties a distribution. In this tutorial you will learn how to calculate the variance and the standard deviation in R with the sd and var functions.
Variance in R with the var function
The variance, denoted by \(S^2_n\), or \(\sigma^2_n\) is the arithmetic mean of the square deviations of the values of the variable respect to its mean. This is,
\(S^2_n = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_i - \bar{x})^2\),
being \(n\) the number of observations and \(\bar{x}\) the mean of the variable.
The denominator n-1 is used to give an unbiased estimator of the variance for i.i.d. observations.
The variance is always positive and greater values will indicate higher dispersion.
When using R, we can make use of the var
function to calculate the variance of a variable. Considering the following sample vector you can calculate its variance with the function:
# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)
# Variance
var(x) # 38.57143
Note that the function provides an argument named na.rm
that can be set to TRUE
to remove missing values.
Standard deviation in R with the sd function
The standard deviation is the positive square root of the variance, this is, \(S_n = \sqrt{S^2_n}\). The standard deviation is more used in Statistics than the variance, as it is expressed in the same units as the variable, while the variance is expressed in square units.
In R, the standard deviation can be calculated making use of the sd
function, as shown below:
# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)
# Standard deviation
sd(x) # 6.21059
# Equivalent to:
sqrt(var(x)) # 6.21059
Similarly, we can calculate the variance as the square of the standard deviation:
# Sample vector
x <- c(10, 25, 12, 18, 5, 16, 14, 20)
# Variance
sd(x) ^ 2 # 38.57143
The sd
function also provides the na.rm
argument, that can be set to TRUE
if the input vector contains any NA
value. Otherwise, the output of the function will be an NA
.