Random samples and permutations in R

The sample
function in R is used to create random samples or permutations (samples with or without replacement) and even select elements randomly based on specific probabilities assigned to each element (weighted sampling).
If you run the examples of this tutorial you will get other output. Set a seed with set.seed
to make the results reproducible.
Syntax of sample
The sample
function has the following syntax:
Being:
x
: a vector or list containing the elements from which to select a sample.size
: the number of items to select. Ifreplace = TRUE
, it specifies the number of items to sample with replacement.replace
: a logical value indicating whether sampling should be done with replacement (TRUE
) or without replacement (FALSE
). Default isFALSE
.prob
: an optional vector of probability weights for obtaining the elements ofx
.
Random sample without replacement
By default, the sample
function returns a random permutation of the input vector, this is, it returns the elements of the vector but in a different order (without repeating any). The following example illustrates how the function returns the vector named x
but in one of the possible different orders.
Note that you can also utilize the sample
function with character or boolean vectors and even lists.
With size
you can control the length of the output. The example below illustrates how to return only one sample, this is, 1 out of the 10 elements of the input vector.
Notice that you can’t input a value greater than the length of the input vector. In this scenario an error will arise unless you set replace = TRUE
.
Random sample with replacement
When replace = TRUE
the sampling is performed with replacement, so if an element is choose it can also be chosen in the following sample and the same element can appear several times.
In this scenario, the sample size can be greater than the length of the input vector and the elements can be repeated.
Weighted sampling
When a random sample is computed all the elements have the same probability. This can be illustrated with a bar plot with the proportions of a random sample of a vector with two different elements.
However, the prob
argument allows specifying different probabilities for each element to reflect real-world scenarios. In the following example the first element of the vector will have a probability of 0.8 while the second 0.2.
The proportions of the elements of the output will be (as in this example) or will be close to the specified probabilities.
Notice that the proportions of elements of the output will converge to the specified probabilities as the sample size increases. The following block of code creates a line chart to illustrate this convergence.
Additional examples
Reproducible samples
As stated before, if you don’t set a seed your results won’t be reproducible. The set.seed
function allows to specify a seed for pseudo-random number generation and hence make the output reproducible.
If you run the previous code you will get the same output of the block of code.
Sample of the rows of a data frame
A common use case of the sample
function is to randomly select rows of a data frame. As rows in R can be selected using indices, you can create a sample of the desired size of a vector from 1 to the number of rows to create a sample of rows.