## The binomial distribution

Denote a **Bernoulli process** as the repetition of a random experiment (a Bernoulli trial) where each independent observation is classified as success if the event occurs or failure otherwise and the proportion of successes in the population is constant and it doesn’t depend on its size.

Let X \sim B(n, p), this is, a random variable that follows a **binomial distribution**, being n the number of Bernoulli trials, p the probability of success and q = 1 - p the probability of failure:

- The
**probability mass function**(PMF) is P(X = x) = \binom{n}{x}p^x q^{n-x} if x = 0, 1, 2, \dots, n. - The cumulative
**distribution function**(CDF) is F(x) = I_q(1 - x, n-x). - The
**quantile function**is Q(p) = F^{-1}(p). - The
**expected mean and variance**of X are E(X) = np and Var(X) = npq, respectively.

The functions of the previous lists can be computed in R for a set of values with the `dbinom`

(probability), `pbinom`

(distribution) and `qbinom`

(quantile) functions. In addition, the `rbinom`

function allows drawing n random samples from a binomial distribution in R. The following table describes briefly these R functions.

Function | Description |
---|---|

dbinom | Binomial probability mass function (Probability function) |

pbinom | Binomial distribution (Cumulative distribution function) |

qbinom | Binomial quantile function |

rbinom | Binomial pseudorandom number generation |

In the following sections we will review each of these functions in detail.

## The dbinom function

In order to calculate the binomial probability function for a set of values x, a number of trials n and a probability of success p you can make use of the `dbinom`

function, which has the following syntax:

```
dbinom(x, # X-axis values (x = 0, 1, 2, ..., n)
size, # Number of trials (n > = 0)
prob, # The probability of success on each trial
log = FALSE) # If TRUE, probabilities are given as log
```

For instance, if you want to calculate the binomial probability mass function for x = 1, 2, \dots, 10 and a probability of succces in each trial of 0.2, you can type:

`dbinom(x = 1:10, size = 10, prob = 0.2)`

```
0.2684354560 0.3019898880 0.2013265920 0.0880803840 0.0264241152
0.0055050240 0.0007864320 0.0000737280 0.0000040960 0.0000001024
```

### Plot of the binomial probability function in R

The binomial probability mass function can be plotted in R making use of the `plot`

function, passing the output of the `dbinom`

function of a set of values to the first argument of the function and setting `type = "h"`

as follows:

```
# Grid of X-axis values
x <- 1:80
# size = 80, prob = 0.2
plot(dbinom(x, size = 80, prob = 0.2), type = "h", lwd = 2,
main = "Binomial probability function",
ylab = "P(X = x)", xlab = "Number of successes")
# size = 80, prob = 0.3
lines(dbinom(x, size = 80, prob = 0.3), type = "h",
lwd = 2, col = rgb(1,0,0, 0.7))
# size = 80, prob = 0.4
lines(dbinom(x, size = 80, prob = 0.4), type = "h",
lwd = 2, col = rgb(0, 1, 0, 0.7))
# Add a legend
legend("topright", legend = c("80 0.2", "80 0.3", "80 0.4"),
title = "size prob", title.adj = 0.95,
lty = 1, col = 1:3, lwd = 2, box.lty = 0)
```

## The pbinom function

In order to calculate the probability of a variable X following a binomial distribution taking values lower than or equal to x you can use the `pbinom`

function, which arguments are described below:

```
pbinom(q, # Quantile or vector of quantiles
size, # Number of trials (n > = 0)
prob, # The probability of success on each trial
lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
log.p = FALSE) # If TRUE, probabilities are given as log
```

By ways of illustration, the probability of the success occurring less than 3 times if the number of trials is 10 and the probability of success is 0.3 is:

`pbinom(3, size = 10, prob = 0.3) # 0.6496107 or 64.96%`

As the binomial distribution is discrete, the previous probability could also be calculated adding each value of the probability function up to three:

`sum(dbinom(0:3, size = 10, prob = 0.3)) # 0.6496107 or 64.96%`

As the binomial distribution is discrete, the cumulative probability can be calculated adding the corresponding probabilities of the probability function. The following R function allows visualizing the probabilities that are added based on a lower bound and an upper bound.

```
# size: number of trials (n > = 0)
# prob: probability of success on each trial
# lb: lower bound of the sum
# ub: upper bound of the sum
# col: color
# lwd: line width
binom_sum <- function(size, prob, lb, ub, col = 4, lwd = 1, ...) {
x <- 0:size
if (missing(lb)) {
lb <- min(x)
}
if (missing(ub)) {
ub <- max(x)
}
plot(dbinom(x, size = size, prob = prob), type = "h", lwd = lwd, ...)
if(lb == min(x) & ub == max(x)) {
color <- col
} else {
color <- rep(1, length(x))
color[(lb + 1):ub ] <- col
}
lines(dbinom(x, size = size, prob = prob), type = "h",
col = color, lwd = lwd, ...)
}
```

As an example, you can represent the probabilities that are added to calculate the probability of a binomial variable taking values equal or lower than 5 if the number of trials is 20 and the probability of success is 0.2 with the following code:

```
binom_sum(size = 20, prob = 0.2, lwd = 2, col = 2, ub = 5,
ylab = "P(X = x)", xlab = "Number of successes")
```

The sum of the probabilities displayed in red is equal to `pbinom(5, size = 20, prob = 0.2)`

.
### pbinom example

In this section we will review a more complete example to understand how to calculate binomial probabilities in several scenarios. Consider that a basketball player scores 4 out of 10 baskets (p = 0.4). If the player thows 20 baskets (20 trials):

- The probability of
**scoring 6 or less baskets**, P(X \leq 6), is:

```
pbinom(6, size = 20, prob = 0.4) # 0.2500107 or 25%
1 - pbinom(6, size = 20, prob = 0.4, lower.tail = FALSE) # Equivalent
```

This probability can also be calculated adding the corresponding elements of the binomial probability function, as we pointed out in the previous section:

`sum(dbinom(0:6, size = 20, prob = 0.4)) # 0.2500107 or 25%`

Using the funtion that we defined before we can represent the calculated probability:

```
binom_sum(size = 20, prob = 0.4, ub = 6, lwd = 2,
ylab = "P(X = x)", xlab = "Number of successes")
```

- The probability of
**scoring less than 6 baskets**, P(X < 6), is:

```
pbinom(5, size = 20, prob = 0.4) # 0.125599 or 12.56%
1 - pbinom(5, size = 20, prob = 0.4, lower.tail = FALSE) # Equivalent
sum(dbinom(0:5, size = 20, prob = 0.4)) # Equivalent
```

Note that we set 5 on the first argument of the function instead of 6 because the binomial distribution is discrete, so P(X < 6) = P(X \leq 5). The calculated probability can be represented with the sum of the following probabilities of the probability mass function:

```
binom_sum(size = 20, prob = 0.4, ub = 5, lwd = 2,
ylab = "P(X = x)", xlab = "Number of successes")
```

- The probability of
**scoring more than 12 baskets**, P(X > 12), is:

```
pbinom(12, size = 20, prob = 0.4, lower.tail = FALSE) # 0.02102893 or 2.1%
1 - pbinom(12, size = 20, prob = 0.4) # Equivalent
sum(dbinom(13:20, size = 20, prob = 0.4)) # Equivalent
```

This probaility corresponds to:

```
binom_sum(size = 20, prob = 0.4, lb = 12, lwd = 2,
ylab = "P(X = x)", xlab = "Number of successes")
```

- The probability of
**scoring between 7 and 11 baskets**, P(7 \leq X \leq 11), is:

```
pbinom(11, size = 20, prob = 0.4) - pbinom(7, size = 20, prob = 0.4) # 0.5275807 or 52.8%
sum(dbinom(8:11, size = 20, prob = 0.4)) # Equivalent
```

The corresponding plot can be created with the following code:

```
binom_sum(size = 20, prob = 0.4, lb = 7, ub = 11, lwd = 2,
ylab = "P(X = x)", xlab = "Number of successes")
```

### Plot of the binomial cumulative distribution in R

The binomial distribution function can be plotted in R with the `plot`

function, setting `type = "s"`

and passing the output of the `pbinom `

function for a specific number of experiments and a probability of success.

The following block of code can be used to plot the binomial cumulative distribution functions for 80 trials and different probabilities.

```
# Grid of X-axis values
x <- 1:80
# size = 80, prob = 0.2
plot(pbinom(x, size = 80, prob = 0.2), type = "s", lwd = 2,
main = "Binomial distribution function",
xlab = "Number of successes", ylab = "F(x)")
# size = 80, prob = 0.3
lines(pbinom(x, size = 80, prob = 0.3), type = "s",
lwd = 2, col = 2)
# size = 80, prob = 0.4
lines(pbinom(x, size = 80, prob = 0.4), type = "s",
lwd = 2, col = 3)
# Add a legend
legend("bottomright", legend = c("80 0.2", "80 0.3", "80 0.4"),
title = "size prob", title.adj = 0.95,
lty = 1, col = 1:3, lwd = 2, box.lty = 0)
```

## The qbinom function

Given a probability or a set of probabilities, the `qbinom`

function allows you to obtain the corresponding binomial quantile. The following block of code describes briefly the arguments of the function:

```
qbinom(p, # Probability or vector of probabilities
size, # Number of trials (n > = 0)
prob, # The probability of success on each trial
lower.tail = TRUE, # If TRUE, probabilities are P(X <= x), or P(X > x) otherwise
log.p = FALSE) # If TRUE, probabilities are given as log
```

As an example, the binomial quantile for the probability 0.4 if n = 5 and p = 0.7 is:

`qbinom(p = 0.4, size = 5, prob = 0.7) # 3`

### Plot of the binomial quantile function in R

The binomial quantile function can be plotted in R for a set of probabilities, a number of trials and a probability of success with the following code:

```
# Grid of X-axis values
x <- 1:80
# size = 80, prob = 0.2
plot(qbinom(seq(0, 1, 0.001), size = 80, prob = 0.2),
main = "Binomial quantile function",
ylab = "Q(p)", xlab = "p",
type = "s", col = 3, xaxt = "n")
axis(1, labels = seq(0, 1, 0.1), at = 0:10 * 100)
# size = 80, prob = 0.3
lines(qbinom(seq(0, 1, 0.001), size = 80, prob = 0.3), type = "s", col = 2)
# size = 80, prob = 0.4
lines(qbinom(seq(0, 1, 0.001), size = 80, prob = 0.4), type = "s", col = 1)
# Add a legend
legend("topleft", legend = c("80 0.2", "80 0.3", "80 0.4"),
title = "size prob", title.adj = 0.95,
lty = 1, col = 1:3, lwd = 2, box.lty = 0)
```

## The rbinom function

The `rbinom`

function allows you to draw n random observations from a binomial distribution in R. The arguments of the function are described below:

```
rbinom(n, # Number of random observations to be generated
size, # Number of trials (> = 0)
prob) # The probability of success on each trial
```

If you want to obtain, for instance, 15 random observations from a binomial distribution if the number of trials is 30 and the probability of success on each trial is 0.1 you can type:

`rbinom(n = 15, size = 30, prob = 0.1)`

`7 3 2 1 5 1 1 4 4 3 1 5 2 4 1`

Nonetheless, if you don’t specify a seed before executing the function you will obtain a different set of random observations. If you want to make the output reproducible you can set a seed as follows:

```
set.seed(2)
rbinom(n = 15, size = 30, prob = 0.1)
```

`2 4 3 1 6 6 1 5 3 3 3 2 4 1 2`