Home » Introduction » Calculate the median in R

# Calculate the median in R ## Median of a discrete variable

To calculate the median of a set of observations we can use the median function. Consider the following vector:

data <- c(126, 52, 133, 104, 115, 67, 57, 83, 53, 105, 100)

In this case we can see that the median of the data is 100:

median(data) # 100

We can check this by sorting the data and seeing that there is the same number of observations at both sides of the median. In this case there are 5 observations on the left and 5 observations on the right.

plot(1, 1, type = "n", axes = FALSE, ann = FALSE,
xlim = c(0, 11), ylim = c(0, 1))
text(c(1:11), rep(0.5, 10), as.character(sort(data)))
rect(xleft = 5.6, ybottom = 0.45, xright = 6.4, ytop = 0.55, border = 2)
arrows(x0 = 0.7, y0 = 0.4, x1 = 5, code = 3, length = 0.15)
arrows(x0 = 7, y0 = 0.4, x1 = 11, code = 3, length = 0.15)
text(c(3, 9), 0.35, "5")

Note that if the number of observations is odd, the median will be calculated as the average of the two central values. Consider the same data as before except for the last observation:

data2 <- c(126, 52, 133, 104, 115, 67, 57, 83, 53, 105)

In this case the median is 93.5:

median(data2) # 93.5

The median corresponds to the average of the values 83 and 104, leaving 4 observations on each side, as illustrated in the following figure:

plot(1, 1, type = "n", axes = T, ann = FALSE,
xlim = c(0, 11), ylim = c(0, 1))
text(c(1:10), rep(0.5, 10), as.character(sort(data2)))
rect(xleft = 4.5, ybottom = 0.45, xright = 6.5, ytop = 0.55, border = 2)
arrows(x0 = 0.7, y0 = 0.4, x1 = 4.25, code = 3, length = 0.15)
arrows(x0 = 6.75, y0 = 0.4, x1 = 10.5, code = 3, length = 0.15)
text(c(2.5, 8.5), 0.35, "4")
text(5.5, 0.6, "mean(c(83, 104)) = 93.5")
If the variable contains NA values you can set the argument na.rm to TRUE to delete them.

## Median of a continuous variable

If instead of a discrete variable we have a continuous we can also use the median function, but in this case the median is the value that leaves a 50% probability on both sides. Consider the normal distribution with mean 0 and standard deviation 1:

set.seed(1)
data3 <- rnorm(1000)

In this case we see that the median is very close to its theoretical value (as the distribution is symmetric, the mean and median are equal, so the theoretical median is 0).

median(data3) # -0.03532423

## Median by groups in R

Finally, if we have a data set classified by groups we can use the tapply function to calculate the median per group. Take the following data as an example:

set.seed(1)
x <- sample(1:1000, 100)
group <- sample(c("A", "B", "C"), 100, replace = TRUE)
data4 <- data.frame(x, group)

head(data4)
   x    group
1 836     B
2 679     A
3 129     A
4 930     C
5 509     C
6 471     C

We can apply the tapply function to the data frame in the following way:

tapply(data4$x, data4$group, median)
    A     B     C
543.0 524.0 525.5

The output will return the median for each group.