Normal distribution in R

Statistics with R Distributions
Create and plot Normal distributions in R

The Normal or Gaussian distribution is the most known and important distribution in Statistics. In this tutorial you will learn what are and what does dnorm, pnorm, qnorm and rnorm functions in R and the differences between them. In consequence, you will learn how to create and plot the Normal distribution in R, calculate probabilities under the curves, the quantiles, Normal random sampling and even how to shade a specific area under a Normal curve.

The normal or gaussian distribution

Among continuous random variables, the most important is the Normal or Gaussian distribution. This variable was introduced by Carl Friedrich in the XIX century for studying error measures.

Let \(X \sim N \left( \mu, \sigma \right)\), namely a random variable following a normal distribution with mean \(\mu\) and standard deviation \(\sigma\):

  • The probability density function (PDF), also known as Bell curve, of \(x\) is \(f(x) = \frac{1}{\sqrt{2\pi \sigma^{2}}} e^{\frac{1}{2} (\frac{x - \mu}{\sigma})^2}\).
  • The cumulative distribution function (CDF) is \(F(x) = P(X \leq x)\).
  • The quantile function is \(Q(p) = F^{-1}(p)\).
  • The expected mean and variance are \(E(X) = \mu\) and \(Var(X) = \sigma^2\), respectively.

In R there exist the dnorm, pnorm and qnorm functions, which allows calculating the normal density, distribution and quantile function for a set of values. In addition, the rnorm function allows obtaining random observations that follow a normal distibution. The following table summarizes the functions related to the normal distribution:

Function Description
dnorm Normal density
(Probability Density Function)
pnorm Normal distribution
(Cumulative Distribution Function)
qnorm Quantile function of the Normal distribution
rnorm Normal random number generation

By default, these functions consider the standard Normal distribution, which has a mean of zero and a standard deviation of one.

Although we will review each function in detail on the corresponding section, in the following illustration you can appreciate the relationship between the dnorm, pnorm and qnorm functions.

dnorm, pnorm and qnorm functions in R

The dnorm function

In R, you can make use of the dnorm function to calculate the density function with mean \(\mu\) and standard deviation \(\sigma\) for any value of \(x, \mu\) and \(\sigma\).

dnorm(x,           # X-axis values (grid)
      mean = 0,    # Integer or vector representing the mean/s
      sd = 1,      # Integer or vector representing the standard deviation/s
      log = FALSE) # If TRUE, probabilities are given as log

Consider, for instance, that you want to obtain the PDF for \(x \in (-4, 4)\), with mean 1 and standard deviation of 3. In order to calculate it, you could type:

x <- -4:4
# x <- seq(-4, 4, length = 100) # More data points
dnorm(x, mean = 1, sd = 3) 
0.03315905 0.05467002 0.08065691 0.10648267 0.12579441
0.13298076 0.12579441 0.10648267 0.08065691

You can also specify vectors to the mean and sd arguments of the function. In the following example, the value of even elements are from \(\mu = 1, \sigma = 3\) and odds are from \(\mu = 2, \sigma = 4\).

dnorm(x, mean = c(1, 2), sd = c(3, 4))
0.03315905 0.04566227 0.08065691 0.07528436 0.12579441
0.09666703 0.12579441 0.09666703 0.08065691

Plot Normal distribution in R

Creating a normal distribution plot in R is easy. You just need to create a grid for the X-axis for the first argument of the plot function and pass as input of the second the dnorm function for the corresponding grid. In the following example we show how to plot normal distributions for different means and variances.

par(mfrow = c(1, 2))

# Grid of X-axis values
x <- seq(-4, 8, 0.1)

#-----------------------------------------
# Same standard deviation, different mean
#-----------------------------------------
# Mean 0, sd 1
plot(x, dnorm(x, mean = 0, sd = 1), type = "l",
     ylim = c(0, 0.6), ylab = "", lwd = 2, col = "red")
# Mean 3, sd 1
lines(x, dnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)

# Adding a legend
legend("topright", legend = c("0 1", "3 1"), col = c("red", "blue"),
       title = expression(paste(mu, " ", sigma)),
       title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)

#-----------------------------------------
# Same mean, different standard deviation
#-----------------------------------------
# Mean 1, sd 1
plot(x, dnorm(x, mean = 1, sd = 1), type = "l",
     ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 1, sd 0.5
lines(x, dnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)

# Adding a legend
legend("topright", legend = c("1 1", "1 0.5"), col = c("red", "blue"),
       title = expression(paste(mu, " ", sigma)),
       title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)

par(mfrow = c(1, 1))

Create normal distribution in R

The pnorm function

The pnorm function gives the Cumulative Distribution Function (CDF) of the Normal distribution in R, which is the probability that the variable \(X\) takes a value lower or equal to \(x\).

The syntax of the function is the following:

pnorm(q,
      mean = 0,
      sd = 1,
      lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
      log.p = FALSE)     # If TRUE, probabilities are given as log

As an example, taking into account that the Normal distribution is symmetric, the probability that the variable will take a value lower than the mean is 0.5:

pnorm(0, mean = 0, sd = 1) # 0.5

pnorm function example

Now, suppose that you have a machine that packages rice inside boxes. The process follows a Normal distribution and it is known that the mean of the weight of each box is 1000 grams and the standard deviation is 10 grams. You can plot the density function typing:

Mean <- 1000
Sd <- 10

# X grid for non-standard normal distribution
x <- seq(-3, 3, length = 100) * Sd + Mean 

# Density function
f <- dnorm(x, Mean, Sd)

plot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")
abline(v = Mean) # Vertical line on the mean

Normal density example with vertical line on the mean

First, if you want to calculate the probability of a box weighing less than 1010 grams (\(P(X < 1010) = P(X \leq 1010)\)), you can type the following:

pnorm(1010, Mean, Sd) # 0.8413447 or 84.13%
1 - pnorm(1010, Mean, Sd, lower.tail = FALSE) # Equivalent

So the probability of a box weighting less than 1010 grams is 0.8413 or 84.13%, which corresponds to the following area:

lb <- min(x) # Lower bound
ub <- 1010   # Upper bound

x2 <- seq(min(x), ub, length = 100) # New Grid
y <- dnorm(x2, Mean, Sd) # Density

plot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")
abline(v = ub) 

polygon(c(lb, x2, ub), c(0, y, 0), col = rgb(0, 0, 1, alpha = 0.5))
text(995, 0.01, "84.13%")

Area under Normal curve in R

As shading the area under the Normal curve can be tricky and requires several lines of code, we have created a simple function to achieve it in a single line:

# mean: mean of the Normal variable
# sd: standard deviation of the Normal variable
# lb: lower bound of the area
# ub: upper bound of the area
# acolor: color of the area
# ...: additional arguments to be passed to lines function

normal_area <- function(mean = 0, sd = 1, lb, ub, acolor = "lightgray", ...) {
    x <- seq(mean - 3 * sd, mean + 3 * sd, length = 100) 
    
    if (missing(lb)) {
       lb <- min(x)
    }
    if (missing(ub)) {
        ub <- max(x)
    }

    x2 <- seq(lb, ub, length = 100)    
    plot(x, dnorm(x, mean, sd), type = "n", ylab = "")
   
    y <- dnorm(x2, mean, sd)
    polygon(c(lb, x2, ub), c(0, y, 0), col = acolor)
    lines(x, dnorm(x, mean, sd), type = "l", ...)
}

As an example, if you want to shade the area between -1 and 2 of a standard Normal distribution you can type:

normal_area(mean = 0, sd = 1, lb = -1, ub = 2, lwd = 2)

Function to shade the area under the Normal curve

Second, in case that you want to calculate the probability of a box weighing more than 980 grams (\(P(X > 980) = P(X \geq 980)\)) you can use the lower.tail argument.

pnorm(980, Mean, Sd, lower.tail = FALSE) # 0.9772499 or 97.72%
1 - pnorm(980, Mean, Sd) # Equivalent
pnorm(1020, Mean, Sd)    # Equivalent by symmetry

The calculated probability corresponds to the following area:

normal_area(mean = Mean, sd = Sd, lb = 980, acolor = rgb(0, 0, 1, alpha = 0.5))
text(1000, 0.01, "97.72%")

Example of the pnorm function in R

Finally, if you want to calculate the probability of a box weighing more than 990 grams and less than 1000 you have to calculate \(P(X \leq 1000) - P(x \leq 990) = P(X < 1000) - P(x <990)\) and hence you can type:

pnorm(1000, Mean, Sd) - pnorm(990, Mean, Sd) # 0.3413447 or 34.13%

You can plot the area with the following code:

normal_area(mean = Mean, sd = Sd, lb = 990, ub = 1000,
            acolor = rgb(0, 0, 1, alpha = 0.5))
text(995, 0.01, "34.13%", srt = 90)

Calculating the probability under the Normal curve

For continuous variables, as \(P(X = x) = 0\), \(P(X \geq x) = P(X > x)\) and \(P(X \leq x) = P(X < x)\).

Plot normal cumulative distribution function in R

With the pnorm function you can also plot the cumulative density function of the Gaussian or Normal distribution in R:

par(mfrow = c(1, 2))

# Grid of X-axis values
x <- seq(-4, 8, 0.1)

#-----------------------------------------
# Same standard deviation, different mean
#-----------------------------------------
# Mean 0, sd 1
plot(x, pnorm(x, mean = 0, sd = 1), type = "l",
     ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 3, sd 1
lines(x, pnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)

# Legend
legend("topleft", legend = c("0 1", "3 1"), col = c("red", "blue"),
       title = expression(paste(mu, " ", sigma)),
       title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)

#-----------------------------------------
# Same mean, different standard deviation
#-----------------------------------------
# Mean 1, sd 1
plot(x, pnorm(x, mean = 1, sd = 1), type = "l",
     ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 1, sd 0.5
lines(x, pnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)

# Legend
legend("topleft", legend = c("1 1", "1 0.5"), col = c("red", "blue"),
       title = expression(paste(mu, " ", sigma)),
       title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)

par(mfrow = c(1, 1))

Cumulative Normal distribution plot

Recall that \(P(X < 0) = 0.5\) for a standard Normal distribution:

x <- seq(-4, 4, 0.1)
plot(x, pnorm(x, mean = 0, sd = 1), type = "l",
     ylim = c(0, 1), ylab = "P(X < x)", lwd = 2, col = "red")
segments(0, 0, 0, 0.5, lwd = 2, lty = 2)
segments(-4, 0.5, 0, 0.5, lwd = 2, lty = 2)

Creating a Cumulative Normal distribution CDF plot in R

The qnorm function

The qnorm function allows you to find the quantile (percentile) \(Q\) for any probability \(p\). Hence, the qnorm function is the inverse of the pnorm function. The syntax of qnorm is as follows:

qnorm(p,                  # Integer or vector of probabilities
      mean = 0,           # Integer or vector of means
      sd = 1,             # Integer or vector of standard deviations
      lower.tail = TRUE,  # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
      log.p = FALSE)      # If TRUE, probabilities are given as log

As a first example, the quantile for probability 0.5 (\(Q(0.5)\)) on a symmetric distribution is equal to the mean:

qnorm(0.5, mean = 0, sd = 1) # 0

In addition, you can obtain the quantile for any given probability. Note the relation between pnorm and qnorm functions:

x <- pnorm(-1.5, mean = 0, sd = 1) # 0.0668072
qnorm(x, mean = 0, sd = 1) # -1.5

normal_area(mean = 0, sd = 1, ub = -1.5,
            lwd = 2, acolor = rgb(0, 0, 1, alpha = 0.5))
arrows(-0.5, 0.1, -1.45, 0, lwd = 2, length = 0.2)
text(-0.25, 0.13, "-1.5", cex = 1.5)

Calculating the quantiles of a Normal distribution

If you want to calculate, for instance, the quantile \(Q(P(X > 1.5)) = Q(1 - P(X \leq 1.5)) = Q(0.067)\) you can set the lower.tail argument to TRUE.

x <- pnorm(1.5, mean = 0, sd = 1, lower.tail = FALSE) # 0.0668072
qnorm(x, mean = 0, sd = 1, lower.tail = FALSE) # 1.5

# Equivalent to:
# x <- 1- pnorm(1.5, mean = 0, sd = 1) 
# qnorm(1 - x, mean = 0, sd = 1)

normal_area(mean = 0, sd = 1, lb = 1.5, lwd = 2,
            acolor = rgb(0, 0, 1, alpha = 0.5))
arrows(0, 0.1, 1.45, 0, lwd = 2, length = 0.2)
text(0, 0.13, "1.5", cex = 1.5)

Calculating the quantile with the qnorm function in R in the upper tail

Plotting the Normal quantile function

You can plot the quantile function of a standard Normal distribution typing the following:

plot(qnorm, pnorm(-4), pnorm(4), lwd = 2, xlab = "p", ylab = "Q(p)")

# Equivalent to:
# x <- seq(pnorm(-4), pnorm(4), length = 100)
# plot(x, qnorm(x, mean = 0, sd = 1), type = "l",
#      lwd = 2, xlab = "p", ylab = "Q(p)")

Normal quantile function plot

The previous plot represents the possible outcomes of the qnorm function for the standard Normal distribution. Recall that \(Q(0.5) = 0\):

plot(qnorm, pnorm(-4), pnorm(4), lwd = 2,
     xlab = "p", ylab = "Q(p)")
segments(0.5, -4, 0.5, 0, lty = 2, lwd = 2)
segments(0, 0, 0.5, 0, lty = 2, lwd = 2)

Inverse of the quantile function in R

The rnorm function

The rnorm function generates \(n\) observations from the Normal distribution with mean \(\mu\) and standard deviation \(\sigma\). The syntax of the rnorm function in R is the following:

rnorm(n,        # Number of observations to be generated
      mean = 0, # Integer or vector of means
      sd = 1)   # Integer or vector of standard deviations

Hence, you can generate 10 observations of a standard Normal distribution in R with the following code:

rnorm(10)
0.3320065 -0.6454991 -0.3692699 -0.4160260  0.5756340
-0.8793241 -0.8095590  0.4926698 -0.8038873 -0.3446890

However, it should be noted that if you don’t specify a seed the output won’t be reproducible. You can make use of the set.seed function to make your code reproducible:

# Seed for reproducibility
set.seed(1)

rnorm(1) # -0.6264538

In addition, in the following plot you can observe how increasing the number of observations, the histogram of the data approaches to the true Normal density function:

par(mfrow = c(1, 3))

x <- seq(-10, 10, length = 200)

set.seed(3)
# n = 10
hist(rnorm(10, mean = 0, sd = 1), main = "n = 10",
     xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)

# n = 100
hist(rnorm(100, mean = 0, sd = 1), main = "n = 100",
     xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)

# n = 1000
hist(rnorm(1000, mean = 0, sd = 1), main = "n = 1000",
     xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)

par(mfrow = c(1, 1))

Example histograms of an approximation with the rnorm function in R