Set seed in R

Statistics with R Simulation
Learn what is and how to set seed in R

When you generate “random numbers” in R, you are actually generating pseudorandom numbers. These numbers are generated with an algorithm that requires a seed to initialize. Being pseudorandom instead of pure random means that, if you know the seed and the generator, you can predict (and reproduce) the output. In this tutorial you will learn the meaning of setting a seed, what does set.seed do in R, how does set.seed work, how to set or unset a seed, and hence, how to make reproducible outputs.

What is to set seed in R?

Setting a seed in R means to initialize a pseudorandom number generator. Most of the simulation methods in Statistics require the possibility to generate pseudorandom numbers that mimic the properties of independent generations of a uniform distribution in the interval \((0, 1)\).

In order to obtain these sequences of pseudorandom numbers, we need a recursive algorithm called Random Number Generator (RNG):

\[x_i = f(x_{i-1}, x_{i-2},x_{i-3}, \dots, x_{i-k}),\]

where \(k\) is the order of the generator and \((x_0, x_1, x_2, \dots, x_{k-1})\) is the seed (or initial state of the generator).

Note that if you know the generator and the seed, the pseudorandom numbers are predictable.

There are several generators, that can be selected with the RNGkind function or with the argument kind of the R set.seed function, that uses by default the Mersenne-Twister generator.

Why set seed in R?

When using functions that sample pseudorandom numbers, each time you execute them you will obtain a different result. Consider, for instance, that you want to sample 5 numbers from a Normal distribution. For that purpose you could type:

rnorm(5)
0.4421843 0.8404235 -1.5879426  0.8557701 -0.1546376

Nevertheless, if you execute the previous code you will obtain a different output. This implies that the code is not reproducible, because you don’t know the seed that R used to generate that sequence.

It is possible that you don’t want your code to be reproducible, but there are several cases where reproducibility is desired. Set a seed in R is used for:

  1. Reproducing the same output of simulation studies.
  2. Help to debug the code when dealing with pseudorandom numbers.

How to set seed in R?

The purpose of the R set.seed function is to allow you to set a seed and a generator (with the kind argument) in R. It is worth to mention that:

  1. The state of the random number generator is stored in .Random.seed (in the global environment). It is a vector of integers which length depends on the generator.
  2. If the seed is not specified, R uses the clock of the system to establish one.

Run again the previous example where we sampled five random numbers from a Normal distribution, but now specify a seed before:

# Specify any integer
set.seed(1) 

rnorm(5) # -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

If you execute the previous code, you will obtain the same output. However, note that if you run rnorm(5) twice, it gives different results:

set.seed(1)

rnorm(5) # -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
rnorm(5) # -0.8204684  0.4874291  0.7383247  0.5757814 -0.3053884

It should be noted that the previous block of code returns the same pseudorandom numbers than the following:

set.seed(1)
rnorm(10)
-0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
-0.8204684  0.4874291  0.7383247  0.5757814 -0.3053884

This is due to when calling a random number generation function, the output depends on the values of .Random.seed, that changes after executing these functions. If you store the value of .Random.seed you can get the current seed state.

set.seed(1)
x <- .Random.seed
rnorm(5)

y <- .Random.seed
rnorm(5)

# .Random.seed is not equal in both cases
identical(x, y) # FALSE

In consequence, in case you want to output the same numbers twice, you have to set the same seed twice:

set.seed(1)
rnorm(5)   # -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

set.seed(1)
rnorm(5)   # -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

As we pointed out before, setting a seed in R is useful when working with simulation studies. Suppose that you want to calculate the median of some values from a uniform distribution:

# Set seed
set.seed(1234)  

n_rep <- 10    # Number of repetitions
n <- 2         # Number of points

Median <- numeric(n_rep) 

for (i in 1:n_rep) {
    Median[i] <- median(runif(n))
}

Median

As we used the set.seed function, if you execute the previous code you will obtain the following result:

0.3680014 0.6163271 0.7506130 0.1210231 0.5901674
0.6192831 0.6030835 0.5648057 0.2765220 0.2094744

Nonetheless, if for some reason an error appears at some iteration you won’t be able to reproduce the error. In order to solve this issue you have two options: saving the value of .Random.seed or changing the seed at each iteration:

set.seed(5)

for (i in 1:n_rep) {
  seed <- .Random.seed
  # If an error arises you can debug with: .Random.seed <- seed

  # Code

}
for (i in 1:n_rep) {
    set.seed(i)
    # If an error arises you can debug with set.seed(i)

    # Code

}

Unset seed

Finally, you may want to unset or reset a seed in R. To achieve it you have two options:

On the one hand, as R uses the system clock to set a seed when not specified, you can use the Sys.time function as follows to come back to the default behavior:

set.seed(Sys.time())

On the other hand, following the documentation of the function, you can pass a NULL to the function, to re-initialize the generator “as if no seed had yet been set”.

set.seed(NULL)