What is to set seed in R?
Setting a seed in R means to initialize a pseudorandom number generator. Most of the simulation methods in Statistics require the possibility to generate pseudorandom numbers that mimic the properties of independent generations of a uniform distribution in the interval (0, 1).
In order to obtain these sequences of pseudorandom numbers, we need a recursive algorithm called Random Number Generator (RNG):
x_i = f(x_{i-1}, x_{i-2},x_{i-3}, \dots, x_{i-k}),
where k is the order of the generator and (x_0, x_1, x_2, \dots, x_{k-1}) is the seed (or initial state of the generator).
There are several generators, that can be selected with the RNGkind
function or with the argument kind
of the R set.seed
function, that uses by default the Mersenne-Twister generator.
Why set seed in R?
When using functions that sample pseudorandom numbers, each time you execute them you will obtain a different result. Consider, for instance, that you want to sample 5 numbers from a Normal distribution. For that purpose you could type:
rnorm(5)
0.4421843 0.8404235 -1.5879426 0.8557701 -0.1546376
Nevertheless, if you execute the previous code you will obtain a different output. This implies that the code is not reproducible, because you don’t know the seed that R used to generate that sequence.
It is possible that you don’t want your code to be reproducible, but there are several cases where reproducibility is desired. Set a seed in R is used for:
- Reproducing the same output of simulation studies.
- Help to debug the code when dealing with pseudorandom numbers.
How to set seed in R?
The purpose of the R set.seed
function is to allow you to set a seed and a generator (with the kind
argument) in R. It is worth to mention that:
- The state of the random number generator is stored in
.Random.seed
(in the global environment). It is a vector of integers which length depends on the generator. - If the seed is not specified, R uses the clock of the system to establish one.
Run again the previous example where we sampled five random numbers from a Normal distribution, but now specify a seed before:
# Specify any integer
set.seed(1)
rnorm(5) # -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
If you execute the previous code, you will obtain the same output. However, note that if you run rnorm(5)
twice, it gives different results:
set.seed(1)
rnorm(5) # -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
rnorm(5) # -0.8204684 0.4874291 0.7383247 0.5757814 -0.3053884
It should be noted that the previous block of code returns the same pseudorandom numbers than the following:
set.seed(1)
rnorm(10)
-0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
-0.8204684 0.4874291 0.7383247 0.5757814 -0.3053884
This is due to when calling a random number generation function, the output depends on the values of .Random.seed
, that changes after executing these functions. If you store the value of .Random.seed
you can get the current seed state.
set.seed(1)
x <- .Random.seed
rnorm(5)
y <- .Random.seed
rnorm(5)
# .Random.seed is not equal in both cases
identical(x, y) # FALSE
In consequence, in case you want to output the same numbers twice, you have to set the same seed twice:
set.seed(1)
rnorm(5) # -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
set.seed(1)
rnorm(5) # -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
As we pointed out before, setting a seed in R is useful when working with simulation studies. Suppose that you want to calculate the median of some values from a uniform distribution:
# Set seed
set.seed(1234)
n_rep <- 10 # Number of repetitions
n <- 2 # Number of points
Median <- numeric(n_rep)
for (i in 1:n_rep) {
Median[i] <- median(runif(n))
}
Median
As we used the set.seed
function, if you execute the previous code you will obtain the following result:
0.3680014 0.6163271 0.7506130 0.1210231 0.5901674
0.6192831 0.6030835 0.5648057 0.2765220 0.2094744
Nonetheless, if for some reason an error appears at some iteration you won’t be able to reproduce the error. In order to solve this issue you have two options: saving the value of .Random.seed
or changing the seed at each iteration:
set.seed(5)
for (i in 1:n_rep) {
seed <- .Random.seed
# If an error arises you can debug with: .Random.seed <- seed
# Code
}
for (i in 1:n_rep) {
set.seed(i)
# If an error arises you can debug with set.seed(i)
# Code
}
Unset seed
Finally, you may want to unset or reset a seed in R. To achieve it you have two options:
On the one hand, as R uses the system clock to set a seed when not specified, you can use the Sys.time
function as follows to come back to the default behavior:
set.seed(Sys.time())
On the other hand, following the documentation of the function, you can pass a NULL
to the function, to re-initialize the generator “as if no seed had yet been set”.
set.seed(NULL)