## The normal or gaussian distribution

Among continuous random variables, the most important is the Normal or Gaussian distribution. This variable was introduced by Carl Friedrich in the XIX century for studying error measures.

Let X \sim N(\mu, \sigma), namely a random variable following a normal distribution with mean \mu and standard deviation \sigma:

- The
**probability density function**(PDF), also known as Bell curve, of x is f(x) = \frac{1}{\sqrt{2\pi \sigma^{2}}} e^{\frac{1}{2} (\frac{x - \mu}{\sigma})^2}. - The
**cumulative distribution function**(CDF) is F(x) = P(X \leq x). - The
**quantile function**is Q(p) = F^{-1}(p). - The
**expencted mean and variance**are E(X) = \mu and Var(X) = \sigma^2, respectively.

In R there exist the `dnorm`

, `pnorm`

and `qnorm`

functions, which allows calculating the normal density, distribution and quantile function for a set of values. In addition, the `rnorm`

function allows obtaining random observations that follow a normal distibution. The following table summarizes the functions related to the normal distribution:

Function | Description |
---|---|

dnorm | Normal density (Probability Density Function) |

pnorm | Normal distribution (Cumulative Distribution Function) |

qnorm | Quantile function of the Normal distribution |

rnorm | Normal random number generation |

Although we will review each function in detail on the corresponding section, in the following illustration you can appreciate the relationship between the `dnorm`

, `pnorm`

and `qnorm`

functions.

## The dnorm function

In R, you can make use of the `dnorm`

function to calculate the density function with mean \mu and standard deviation \sigma for any value of x, \mu and \sigma.

```
dnorm(x, # X-axis values (grid)
mean = 0, # Integer or vector representing the mean/s
sd = 1, # Integer or vector representing the standard deviation/s
log = FALSE) # If TRUE, probabilities are given as log
```

Consider, for instance, that you want to obtain the PDF for x \in (-4, 4), with mean 1 and standard deviation of 3. In order to calculate it, you could type:

```
x <- -4:4
# x <- seq(-4, 4, length = 100) # More data points
dnorm(x, mean = 1, sd = 3)
```

```
0.03315905 0.05467002 0.08065691 0.10648267 0.12579441
0.13298076 0.12579441 0.10648267 0.08065691
```

You can also specify vectors to the `mean`

and `sd`

arguments of the function. In the following example, the value of even elements are from \mu = 1, \sigma = 3 and odds are from \mu = 2, \sigma = 4.

`dnorm(x, mean = c(1, 2), sd = c(3, 4))`

```
0.03315905 0.04566227 0.08065691 0.07528436 0.12579441
0.09666703 0.12579441 0.09666703 0.08065691
```

### Plot Normal distribution in R

**Creating a normal distribution plot in R** is easy. You just need to create a grid for the X-axis for the first argument of the `plot`

function and pass as input of the second the `dnorm`

function for the corresponding grid. In the following example we show how to plot normal distributions for different means and variances.

```
par(mfrow = c(1, 2))
# Grid of X-axis values
x <- seq(-4, 8, 0.1)
#-----------------------------------------
# Same standard deviation, different mean
#-----------------------------------------
# Mean 0, sd 1
plot(x, dnorm(x, mean = 0, sd = 1), type = "l",
ylim = c(0, 0.6), ylab = "", lwd = 2, col = "red")
# Mean 3, sd 1
lines(x, dnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)
# Adding a legend
legend("topright", legend = c("0 1", "3 1"), col = c("red", "blue"),
title = expression(paste(mu, " ", sigma)),
title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)
#-----------------------------------------
# Same mean, different standard deviation
#-----------------------------------------
# Mean 1, sd 1
plot(x, dnorm(x, mean = 1, sd = 1), type = "l",
ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 1, sd 0.5
lines(x, dnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)
# Adding a legend
legend("topright", legend = c("1 1", "1 0.5"), col = c("red", "blue"),
title = expression(paste(mu, " ", sigma)),
title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)
par(mfrow = c(1, 1))
```

## The pnorm function

The `pnorm`

function gives the **Cumulative Distribution Function** (CDF) of the Normal distribution in R, which is the probability that the variable X takes a value lower or equal to x.

The syntax of the function is the following:

```
pnorm(q,
mean = 0,
sd = 1,
lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
log.p = FALSE) # If TRUE, probabilities are given as log
```

As an example, taking into account that the Normal distribution is symmetric, the probability that the variable will take a value lower than the mean is 0.5:

`pnorm(0, mean = 0, sd = 1) # 0.5`

### pnorm function example

Now, suppose that you have a machine that packages rice inside boxes. The process follows a Normal distribution and it is known that the mean of the weight of each box is 1000 grams and the standard deviation is 10 grams. You can plot the density function typing:

```
Mean <- 1000
Sd <- 10
# X grid for non-standard normal distribution
x <- seq(-3, 3, length = 100) * Sd + Mean
# Density function
f <- dnorm(x, Mean, Sd)
plot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")
abline(v = Mean) # Vertical line on the mean
```

First, if you want to calculate the **probability of a box weighing less than 1010 grams** (P(X < 1010) = P(X \leq 1010)), you can type the following:

```
pnorm(1010, Mean, Sd) # 0.8413447 or 84.13%
1 - pnorm(1010, Mean, Sd, lower.tail = FALSE) # Equivalent
```

So the probability of a box wheighing less than 1010 grams is 0.8413 or 84.13%, which corresponds to the following area:

```
lb <- min(x) # Lower bound
ub <- 1010 # Upper bound
x2 <- seq(min(x), ub, length = 100) # New Grid
y <- dnorm(x2, Mean, Sd) # Density
plot(x, f, type = "l", lwd = 2, col = "blue", ylab = "", xlab = "Weight")
abline(v = ub)
polygon(c(lb, x2, ub), c(0, y, 0), col = rgb(0, 0, 1, alpha = 0.5))
text(995, 0.01, "84.13%")
```

As shading the area under the Normal curve can be tricky and requires several lines of code, we have created a simple function to achieve it in a single line:

```
# mean: mean of the Normal variable
# sd: standard deviation of the Normal variable
# lb: lower bound of the area
# ub: upper bound of the area
# acolor: color of the area
# ...: additional arguments to be passed to lines function
normal_area <- function(mean = 0, sd = 1, lb, ub, acolor = "lightgray", ...) {
x <- seq(mean - 3 * sd, mean + 3 * sd, length = 100)
if (missing(lb)) {
lb <- min(x)
}
if (missing(ub)) {
ub <- max(x)
}
x2 <- seq(lb, ub, length = 100)
plot(x, dnorm(x, mean, sd), type = "n", ylab = "")
y <- dnorm(x2, mean, sd)
polygon(c(lb, x2, ub), c(0, y, 0), col = acolor)
lines(x, dnorm(x, mean, sd), type = "l", ...)
}
```

As an example, if you want to shade the area between -1 and 2 of a standard Normal distribution you can type:

`normal_area(mean = 0, sd = 1, lb = -1, ub = 2, lwd = 2)`

Second, in case that you want to calculate the **probability of a box weighing more than 980 grams** (P(X > 980) = P(X \geq 980)) you can use the `lower.tail`

argument.

```
pnorm(980, Mean, Sd, lower.tail = FALSE) # 0.9772499 or 97.72%
1 - pnorm(980, Mean, Sd) # Equivalent
pnorm(1020, Mean, Sd) # Equivalent by symmetry
```

The calculated probability corresponds to the following area:

```
normal_area(mean = Mean, sd = Sd, lb = 980, acolor = rgb(0, 0, 1, alpha = 0.5))
text(1000, 0.01, "97.72%")
```

Finally, if you want to calculate the **probability of a box weighing more than 990 grams and less than 1000** you have to calculate P(X \leq 1000) - P(x \leq 990) = P(X < 1000) - P(x <990) and hence you can type:

`pnorm(1000, Mean, Sd) - pnorm(990, Mean, Sd) # 0.3413447 or 34.13%`

You can plot the area with the following code:

```
normal_area(mean = Mean, sd = Sd, lb = 990, ub = 1000,
acolor = rgb(0, 0, 1, alpha = 0.5))
text(995, 0.01, "34.13%", srt = 90)
```

### Plot normal cumulative distribution function in R

With the `pnorm`

function you can also plot the cumulative density function of the Gaussian or Normal distribution in R:

```
par(mfrow = c(1, 2))
# Grid of X-axis values
x <- seq(-4, 8, 0.1)
#-----------------------------------------
# Same standard deviation, different mean
#-----------------------------------------
# Mean 0, sd 1
plot(x, pnorm(x, mean = 0, sd = 1), type = "l",
ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 3, sd 1
lines(x, pnorm(x, mean = 3, sd = 1), col = "blue", lty = 1, lwd = 2)
# Legend
legend("topleft", legend = c("0 1", "3 1"), col = c("red", "blue"),
title = expression(paste(mu, " ", sigma)),
title.adj = 0.9, lty = 1, lwd = 2, box.lty = 0)
#-----------------------------------------
# Same mean, different standard deviation
#-----------------------------------------
# Mean 1, sd 1
plot(x, pnorm(x, mean = 1, sd = 1), type = "l",
ylim = c(0, 1), ylab = "", lwd = 2, col = "red")
# Mean 1, sd 0.5
lines(x, pnorm(x, mean = 1, sd = 0.5), col = "blue", lty = 1, lwd = 2)
# Legend
legend("topleft", legend = c("1 1", "1 0.5"), col = c("red", "blue"),
title = expression(paste(mu, " ", sigma)),
title.adj = 0.75, lty = 1, lwd = 2, box.lty = 0)
par(mfrow = c(1, 1))
```

Recall that P(X < 0) = 0.5 for a standard Normal distribution:

```
x <- seq(-4, 4, 0.1)
plot(x, pnorm(x, mean = 0, sd = 1), type = "l",
ylim = c(0, 1), ylab = "P(X < x)", lwd = 2, col = "red")
segments(0, 0, 0, 0.5, lwd = 2, lty = 2)
segments(-4, 0.5, 0, 0.5, lwd = 2, lty = 2)
```

## The qnorm function

The `qnorm`

function allows you to find the quantile (percentile) Q for any probability p. Hence, the `qnorm`

function is the inverse of the `pnorm`

function. The syntax of `qnorm`

is as follows:

```
qnorm(p, # Integer or vector of probabilities
mean = 0, # Integer or vector of means
sd = 1, # Integer or vector of standard deviations
lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
log.p = FALSE) # If TRUE, probabilities are given as log
```

As a first example, the quantile for probability 0.5 (Q(0.5)) on a symmetric distribution is equal to the mean:

`qnorm(0.5, mean = 0, sd = 1) # 0`

In addition, you can obtain the quantile for any given probability. Note the relation between `pnorm`

and `qnorm`

functions:

```
x <- pnorm(-1.5, mean = 0, sd = 1) # 0.8413447
qnorm(x, mean = 0, sd = 1) # -1.5
normal_area(mean = 0, sd = 1, lb = -10, ub = -1.5,
lwd = 2, acolor = rgb(0, 0, 1, alpha = 0.5))
arrows(-0.5, 0.1, -1.45, 0, lwd = 2, length = 0.2)
text(-0.25, 0.13, "-1.5", cex = 1.5)
```

If you want to calculate, for instance, the quantile Q(P(X > 1.5)) = Q(1 - P(X \leq 1.5)) = Q(0.067) you can set the `lower.tail`

argument to `TRUE`

.

```
x <- pnorm(1.5, mean = 0, sd = 1, lower.tail = FALSE) # 0.0668072
qnorm(x, mean = 0, sd = 1, lower.tail = FALSE) # 1.5
# Equivalent to:
# x <- 1- pnorm(1.5, mean = 0, sd = 1)
# qnorm(1 - x, mean = 0, sd = 1)
normal_area(mean = 0, sd = 1, lb = 1.5, lwd = 2,
acolor = rgb(0, 0, 1, alpha = 0.5))
arrows(0, 0.1, 1.45, 0, lwd = 2, length = 0.2)
text(0, 0.13, "1.5", cex = 1.5)
```

### Plotting the Normal quantile function

You can plot the quantile function of a standard Normal distribution typing the following:

```
plot(qnorm, pnorm(-4), pnorm(4), lwd = 2, xlab = "p", ylab = "Q(p)")
# Equivalent to:
# x <- seq(pnorm(-4), pnorm(4), length = 100)
# plot(x, qnorm(x, mean = 0, sd = 1), type = "l",
# lwd = 2, xlab = "p", ylab = "Q(p)")
```

The previous plot represents the possible outcomes of the `qnorm`

function for the standard Normal distribution. Recall that Q(0.5) = 0:

```
plot(qnorm, pnorm(-4), pnorm(4), lwd = 2,
xlab = "p", ylab = "Q(p)")
segments(0.5, -4, 0.5, 0, lty = 2, lwd = 2)
segments(0, 0, 0.5, 0, lty = 2, lwd = 2)
```

## The rnorm function

The `rnorm`

function generates n observations from the Normal distribution with mean \mu and standard deviation \sigma. The syntax of the `rnorm`

function in R is the following:

```
rnorm(n, # Number of observations to be generated
mean = 0, # Integer or vector of means
sd = 1) # Integer or vector of standard deviations
```

Hence, you can generate 10 observations of a standard Normal distribution in R with the following code:

`rnorm(10)`

```
0.3320065 -0.6454991 -0.3692699 -0.4160260 0.5756340
-0.8793241 -0.8095590 0.4926698 -0.8038873 -0.3446890
```

However, it should be noted that if you don’t specify a seed the output won’t be reproducible. You can make use of the `set.seed`

function to make your code reproducible:

```
# Seed for reproducibility
set.seed(1)
rnorm(1) # -0.6264538
```

In addition, in the following plot you can observe how increasing the number of observations, the histogram of the data approaches to the true Normal density function:

```
par(mfrow = c(1, 3))
x <- seq(-10, 10, length = 200)
set.seed(3)
# n = 10
hist(rnorm(10, mean = 0, sd = 1), main = "n = 10",
xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)
# n = 100
hist(rnorm(100, mean = 0, sd = 1), main = "n = 100",
xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)
# n = 1000
hist(rnorm(1000, mean = 0, sd = 1), main = "n = 1000",
xlab = "", prob = TRUE)
lines(x, dnorm(x), col = "red", lwd = 2)
par(mfrow = c(1, 1))
```