Row and column sums and means in R
The colSums
and colMeans
functions compute the sum and mean for each column of a data frame or matrix while rowSums
and rowMeans
compute the sum and mean for each of the rows of the object. These functions are equivalent to use apply
with FUN = sum
or FUN = mean
for rows or columns but a lot more efficient.
Column sums with colSums
The colSums
function calculates the sum for each of the columns of a data frame or matrix. In the following examples we are going to use the following sample matrix with four columns and nine rows:
# Sample matrix
x <- matrix(50:85, ncol = 4)
x
[,1] [,2] [,3] [,4]
[1,] 50 59 68 77
[2,] 51 60 69 78
[3,] 52 61 70 79
[4,] 53 62 71 80
[5,] 54 63 72 81
[6,] 55 64 73 82
[7,] 56 65 74 83
[8,] 57 66 75 84
[9,] 58 67 76 85
If you want to calculate the sum for each of the four columns you can input the matrix to the colSums
function.
# Sum of each column
colSums(x)
# Equivalent to:
# apply(x, 2, sum)
486 567 648 729
So the sum of the first column is 486, the sum of the second is 567, the sum of the third is 648 and the sum of the fourth column is 729.
If your object contains any NA
values you can set the na.rm
argument to TRUE
. Consider the same matrix but with two NA
values:
# Sample matrix
x <- matrix(50:85, ncol = 4)
x[3, 2:3] <- NA
x
[,1] [,2] [,3] [,4]
[1,] 50 59 68 77
[2,] 51 60 69 78
[3,] 52 NA NA 79
[4,] 53 62 71 80
[5,] 54 63 72 81
[6,] 55 64 73 82
[7,] 56 65 74 83
[8,] 57 66 75 84
[9,] 58 67 76 85
If you try to calculate the sums you will get two NA
values on the columns containing those missing values.
colSums(x)
486 NA NA 729
To avoid this issue and calculate the sum of the available values you will have to set na.rm = TRUE
as in the example below.
# Sum of each column removing NA values
colSums(x, na.rm = TRUE)
# Equivalent to:
# apply(x, 2, sum, na.rm = TRUE)
486 506 578 729
Column means with colMeans
The colMeans
function calculates the mean for each of the columns of a data frame or matrix.
# Sample matrix
x <- matrix(50:85, ncol = 4)
# Mean of each column
colMeans(x)
# Equivalent to:
# apply(x, 2, mean)
54 63 72 81
The output means that the mean of the values of the first column is 54, the mean of the second column is 63 and so on. If your data contains any NA
values recall to set na.rm = TRUE
.
Row sums with rowSums
The rowSums
function calculates the sum for each of the rows of a data frame or matrix. You just need to input your object to the function.
# Sample matrix
x <- matrix(50:85, ncol = 4)
# Sum of each row
rowSums(x)
# Equivalent to:
# apply(x, 1, sum)
254 258 262 266 270 274 278 282 286
In the previous example the sum of the values of the first row is 254, the sum of the second row is 258 and so on. If the data contains missing values you can set the na.rm
argument of the function to TRUE
in order to ignore them.
Row means with rowMeans
The rowMeans
function computes the mean for each of the rows of a data frame or matrix. You will need to pass your matrix or data frame as input of the function to obtain the means for each of the rows.
# Sample matrix
x <- matrix(50:85, ncol = 4)
# Mean of each row
rowMeans(x)
# Equivalent to:
# apply(x, 1, mean)
63.5 64.5 65.5 66.5 67.5 68.5 69.5 70.5 71.5
The mean of the values of the first row is 63.5, the mean of the values of the second row is 64.5 and so on.