Sort data in R
Sorting data in R language can be achieved in several ways, depending on how you want to sort or order your data. In this tutorial you will learn how to sort in R in ascending, descending or alphabetical order and how to order based on other vector in several data structures.
order() function in R
The R order
function returns a permutation of the order of the elements of a vector. The syntax with summarized descriptions of the arguments is as follows:
order(x, # Sequence of vectors of the same length
decreasing = FALSE, # Whether to sort in increasing or decreasing order
na.last = TRUE, # Whether to put NA values at the beginning or at the end
method = c("auto", "shell", "radix")) # Method to be used. Defaults to auto
sort.list(x, # Atomic vector
decreasing = FALSE,
partial = NULL, # Vector indices for partial sorting
na.last = TRUE,
method = c("auto", "shell", "quick", "radix"))
Note that the main difference between order
and sort.list
is that the first it is designed for more than one vector of the same length. However, it is common to use the order
function with just one vector.
v <- c(34, 47, 25, 14)
order(v)
# sort.list(v) # Equivalent
4 3 1 2
The output is an index vector that in this example means that if you want to sort the vector in ascending order, you have to put the fourth element first (14), then the third (25), then the first (34) and the greatest value is the second (47). If you set the decreasing
argument to TRUE
, you will have the vector of indices in descending order.
order(v, decreasing = TRUE)
2 1 3 4
If the vector contains any NA
values there will be at the end of the index vector by default.
vec <- c(24, 26, 2, 5, NA, 40, 12, NA)
order(vec)
3 4 7 1 2 6 5 8
In case you want the NA
values to be displayed at the beginning you can set the na.last
argument to FALSE
. Note that if you prefer removing the NA
values, remember to call the na.omit
or use some similar approach.
order(vec, na.last = FALSE)
5 8 3 4 7 1 2 6
You can also use the order
function with a character vector. Note that ordering a categorical variable means ordering it in alphabetical order.
sort() function in R
The sort
function returns sorted, in ascending order by default, the vector you pass as input.
sort(x, # Atomic vector
decreasing = FALSE, # Whether to sort in increasing or decreasing order
na.last = TRUE, # Whether to put NA values at the beginning or at the end
...) # Additional arguments
sort.int(x, # Atomic vector and factor
partial = NULL, # Partial sorting indices vector
decreasing = FALSE, # Same as above
na.last = TRUE, # Same as above
method = c("auto", "shell", "radix"), # Method to be used. Defaults to auto
index.return = FALSE, # Whether to return the ordering index vector or not
...) # Additional arguments
Difference between sort and order in R
It is usual to get confused between sort
and order
functions in R. On the one hand consider, for instance, the following vector and apply the order function to it:
my_vec <- c(1, 5.2, 22, 9, -5, 2)
ii <- order(my_vec)
ii
5 1 6 2 4 3
If you index the vector with the output of the order
function you will obtain the initial vector sorted in ascending order:
my_vec[ii]
-5.0 1.0 2.0 5.2 9.0 22.0
On the other hand, the sort
function will return by default the vector ordered in ascending order. However, you can also obtain the same result as the one with the order
function if you set the argument index.return
to TRUE
.
sort(my_vec, index.return = TRUE)
$`x`
[1] -5.0 1.0 2.0 5.2 9.0 22.0
$ix
[1] 5 1 6 2 4 3
Order vector in R
There are three different ways of ordering a vector: in ascending order, in descending order or based on the index of other vector of the same length. In this section we are going to use the following sample vector:
x <- c(56, 14, 1, 28)
Note that when working with a large vector you can use the is.unsorted
function to verify if the vector is sorted or not, instead of visually check the order.
is.unsorted(x) # TRUE
Ascending order
Sorting in ascending order means that the values will be ordered from lower to higher. For that purpose, you can use the order
and sort
functions as follows:
x[order(x)]
# Equivalent to:
ii <- order(x)
x[ii]
# Equivalent to:
sort(x)
1 14 28 56
Descending order
Sorting a vector in descending order means ordering the elements from higher to lower. Hence, you can order the opposite of the vector (with the minus sign) or setting the argument decreasing = TRUE
as follows:
x[order(-x)]
# Equivalent to:
x[order(x, decreasing = TRUE)]
# Equivalent to:
sort(x, decreasing = TRUE)
56 28 14 1
Order by other vector
You can order some vector using other of the same length as the index vector. In the following example the vector y
indicates that the second element of the x
vector (14) must be the first, the third (1) the second, the first (56) the third and finally the fourth (28).
y <- c(2, 3, 1, 4)
x[y]
14 1 56 28
Moreover, you could also order the vector x
by the index vector of the vector y
.
# order(y) # 3 1 2 4
x[order(y)]
1 56 14 28
Order data frame or matrix in R
When working with a matrix or a data frame in R you could want to order the data by row or by column values. Note that although we are going to use a data frame as an example, the explanations are equivalent to the case of matrices. In order to explain how to sort a data frame in R we are going to use the attitude
dataset from base R.
my_df <- attitude[, c(2, 3, 4)]
head(my_df)
complaints privileges learning
1 51 30 39
2 64 51 54
3 70 68 69
4 63 45 47
5 78 56 66
6 55 49 44
Sort dataframe by column
Suppose you want to order the data frame by the privileges
column in ascending order. Consequently, you could type:
# Order by privileges column
ordered_df <- my_df[order(my_df$privileges), ]
# Show first rows
head(ordered_df)
complaints privileges learning
1 51 30 39
21 40 33 34
30 82 39 59
7 67 42 56
24 37 42 58
25 54 42 48
Note that in case of ties, the order is based on the index of the rows.
In addition, in case you need sorting your data frame by multiple columns, specify more columns inside the order
function. This is very useful when the main column you are ordering has ties.
# Order by 'privileges' column and then by 'complaints' column
ordered_df <- my_df[order(my_df$privileges, my_df$complaints), ]
# First rows
head(ordered_df)
complaints privileges learning
1 51 30 39
21 40 33 34
30 82 39 59
24 37 42 58 # <- Note the difference
25 54 42 48 # with the previous output
7 67 42 56
Note that the complaints
column is now sorted for those values where the privileges
column has ties.
Change order of rows and columns
You can change the order of columns in R modifying the order of the index that defines the columns. Apart from this, you can also reverse the order with a sequence from the number of columns of the data frame to 1.
# Custom order of columns
my_df[, c(2, 1, 3)]
# Reverse order of columns
my_df[, ncol(my_df):1]
Equivalently, you can modify the order of rows:
# Custom order of rows (random)
my_df[sample(nrow(my_df), replace = FALSE), ]
# Reverse order of rows
my_df[nrow(my_df):1, ]
Sort rows alphabetically
Consider the following sample data frame, where each row has been randomly named with a letter.
set.seed(4)
my_df <- data.frame(x = 1:10, y = 12:21)
rownames(my_df) <- sample(letters, nrow(my_df))
my_df
x y
p 1 12
a 2 13
h 3 14
g 4 15
r 5 16
f 6 17
o 7 18
v 8 19
s 9 20
b 10 21
You can order the rows alphabetically with the order
and rownames
functions as follows:
my_df[order(rownames(my_df)), ]
x y
a 2 13
b 10 21
f 6 17
g 4 15
h 3 14
o 7 18
p 1 12
r 5 16
s 9 20
v 8 19
Sort list in R
In this section you will learn how to sort a list in R. There are three ways for ordering a list in R: sorting the elements in alphabetical order, creating a custom order, or ordering a specific list element. Consider, for instance, the following sample list:
my_list <- list(b = 1:10, a = letters[1:5], c = matrix(1:2, ncol = 2))
my_list
$`b`
[1] 1 2 3 4 5 6 7 8 9 10
$a
[1] "a" "b" "c" "d" "e"
$c
[,1] [,2]
[1,] 1 2
You can order the elements of the list alphabetically using the order
and names
functions as follows:
# Order elements alphabetically
my_list[order(names(my_list))]
$`a`
[1] "a" "b" "c" "d" "e"
$b
[1] 1 2 3 4 5 6 7 8 9 10
$c
[,1] [,2]
[1,] 1 2
If preferred, you can manually create a custom order specifying the names or the index of the elements inside the c
function.
# Custom sorting
my_list[c("b", "c", "a")]
my_list[c(1, 3, 2)] # Equivalent
$`b`
[1] 1 2 3 4 5 6 7 8 9 10
$c
[,1] [,2]
[1,] 1 2
$a
[1] "a" "b" "c" "d" "e"
Finally, it could be interesting to order a list element. In the following case, the sorting will be the same as sorting a vector.
# Order list element
sort(my_list$b, decreasing = TRUE)
How to sort categorical data in R?
You can order character or categorical data in R in different ways. Consider the following categorical variable:
set.seed(1)
categorical_data <- rownames(mtcars)[sample(10)]
categorical_data
"Datsun 710" "Hornet 4 Drive" "Hornet Sportabout" "Duster 360" "Mazda RX4 Wag"
"Merc 240D" "Merc 230" "Valiant" "Merc 280" "Mazda RX4"
In this scenario you can make use of the sort
function to sort the variable in alphabetical order, as we reviewed in the section about ordering vectors. If the variable contains character numbers, they will also be ordered correctly.
sort(categorical_data)
An alternative to order a categorical variable alphabetically in R is converting it to a factor and sorting it.
sort(factor(categorical_data))
Datsun 710 Duster 360 Hornet 4 Drive Hornet Sportabout
Mazda RX4 Mazda RX4 Wag Merc 230 Merc 240D
Merc 280 Valiant
Levels: Datsun 710 Duster 360 Hornet 4 Drive Hornet Sportabout ... Valiant
However, if you want to return the index when ordering factors in R, you will need to use the sort.int
function to use the index.return
argument.
sort.int(factor(categorical_data), index.return = TRUE)
$`x`
[1] Datsun 710 Duster 360 Hornet 4 Drive Hornet Sportabout
[5] Mazda RX4 Mazda RX4 Wag Merc 230 Merc 240D
[9] Merc 280 Valiant
10 Levels: Datsun 710 Duster 360 Hornet 4 Drive Hornet Sportabout ... Valiant
$ix
[1] 1 4 2 3 10 5 7 6 9 8