Test for proportions with prop.test()

Statistics with R Hypothesis testing
Test for proportions in R with prop.test() function

The prop.test function in R is used for testing the null hypothesis that the proportions of two independent random variables \(X\) and \(Y\) are equal (two sample proportion test) or for examining a single proportion against a hypothesized value (one sample proportion test).

Syntax

The syntax of the prop.test function is the following:

prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)

Being:

  • x: A numeric vector or a two-column matrix. For the vector case, it represents the number of successes; for the matrix case, the first column indicates the number of successes and the second column the number of failures.
  • n: The number of trials for each proportion.
  • p: A vector of probabilities or a single probability value under the null hypothesis. If not given p = 0.5.
  • alternative: Specifies the alternative hypothesis. Possible values are: "two.sided", "less", or "greater".
  • conf.level: Confidence level for the returned confidence interval. Defaults to 0.95.
  • correct: A logical value indicating whether to apply Yates continuity correction. Defaults to TRUE.

The function returns the X-squared statistic, the degrees of freedom, the p-value, the alternative hypothesis, the confidence interval and the sample estimate of the proportion.

One proportion Z test

The one-sample proportion test compares a sample proportion to a known population proportion or a hypothesized proportion.

If the sample size is small (n < 30) use binom.test instead of prop.test to calculate an exact test.

By default, the prop.test function applies the Yates continuity correction where possible. It is applied to the Chi-squared test when the expected frequencies are less than 5 in 2x2 tables. If you don’t want to apply this correction set correct = FALSE.

Equal to a proportion

Consider the following null and alternative hypothesis:

  • \(H_0\): the proportion of X IS \(p\).
  • \(H_1\): the proportion of X IS NOT \(p\).

To perform this test you will need to input the number of successes to x, the number of trials to n and the hypothesized proportion to p. The default confidence level for the confidence interval is 95%.

# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion equal to 0.6 for a 95% confidence level?
prop.test(x = 42, n = 107, p = 0.6, conf.level = 0.95)

# Equivalent to input a two-column matrix with the number of successes (42) and failures (65) 
# prop.test(x = matrix(c(42, 65), ncol = 2), p = 0.6, conf.level = 0.95)
	1-sample proportions test with continuity correction

data:  42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 1.851e-05
alternative hypothesis: true p is not equal to 0.6
95 percent confidence interval:
 0.3009435 0.4919223
sample estimates:
        p 
0.3925234 

The obtained p-value (1.851e-05) is significantly smaller than the usual significance levels, indicating strong evidence against the null hypothesis that the true proportion is equal to 0.6. Therefore, we can conclude that the true proportion of success is significantly different from 0.6. In addition, the upper limit of the confidence interval (0.4919223) is lower than the hypothesized proportion (0.6).

Lower than a proportion

You can also perform a test where the alternative hypothesis is that the proportion is lower than a specific value:

  • \(H_0\): the proportion of X IS \(p\).
  • \(H_1\): the proportion of X LOWER than \(p\).
# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion lower than 0.6?
prop.test(x = 42, n = 107, p = 0.6, alternative = "less")
	1-sample proportions test with continuity correction

data:  42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 9.255e-06
alternative hypothesis: true p is less than 0.6
95 percent confidence interval:
 0.0000000 0.4766163
sample estimates:
        p 
0.3925234

The p-value is almost zero, which implies that there is strong evidence against the null hypothesis. Consequently, we would reject the null hypothesis in favor of the alternative, concluding that the true proportion of success is significantly less than 0.6. Moreover, it’s important to note that the upper limit of the confidence interval (0.4766163) falls below the null hypothesis proportion of 0.6.

Greater than a proportion

The final option for a one-sample proportion test is to examine whether the true proportion is greater than the specified value for the alternative hypothesis.

  • \(H_0\): the proportion of X IS \(p\).
  • \(H_1\): the proportion of X GREATER than \(p\).
# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion greater than 0.6?
prop.test(x = 42, n = 107, p = 0.6, alternative = "greater")
	1-sample proportions test with continuity correction

data:  42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 1
alternative hypothesis: true p is greater than 0.6
95 percent confidence interval:
 0.3140465 1.0000000
sample estimates:
        p 
0.3925234 

The test gives a p-value of 1, indicating there is no evidence to reject the null hypothesis that the true proportion is \(p\).

Two proportions Z test (difference of proportions)

The two-sample proportion test compares proportions between two independent groups. It assesses whether the proportions in these groups significantly differ from each other.

Equal proportions

Consider the following null and alternative hypothesis:

  • \(H_0\): the proportion of X IS equal to the proportion of \(Y\). (Or the difference of proportions is 0)
  • \(H_1\): the proportion of X IS NOT equal to the proportion of \(Y\).

You can perform a two sample proportion test as follows:

# X
p1 <- 50  # Successes
n1 <- 100 # Trials

# Y
p2 <- 80  # Successes
n2 <- 200 # Trials

# Is the proportion of X equal to the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2))
	2-sample test for equality of proportions with continuity correction

data:  c(p1, p2) out of c(n1, n2)
X-squared = 2.323, df = 1, p-value = 0.1275
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.02671995  0.22671995
sample estimates:
prop 1 prop 2 
   0.5    0.4 

The p-value is greater than the usual significance levels, so we don’t have enough statistical evidence to reject the null hypothesis, this is, there is no evidence to suggest different proportions between groups.

Lower

You can also perform a test where the alternative hypothesis is that the proportion of \(X\) is less than the proportion of \(Y\), this is:

  • \(H_0\): the proportion of X IS equal to the proportion of \(Y\).
  • \(H_1\): the proportion of X IS LOWER than the proportion of \(Y\).

To perform this test you will have to specify alternative = "less", as shown below:

# X
p1 <- 50  # Successes
n1 <- 100 # Trials

# Y
p2 <- 150 # Successes
n2 <- 200 # Trials

# Is the proportion of X lower than the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2), alternative = "less")
	2-sample test for equality of proportions with continuity correction

data:  c(p1, p2) out of c(n1, n2)
X-squared = 17.642, df = 1, p-value = 1.333e-05
alternative hypothesis: less
95 percent confidence interval:
 -1.0000000 -0.1460619
sample estimates:
prop 1 prop 2 
  0.50   0.75 

The p-value is 1.333e-05, near zero, which implies that there is statistical evidence to reject the null hypothesis and support the alternative hypothesis that the proportion of \(X\) is lower than the proportion of \(Y\). In addition, the 95 percent confidence interval ranges from -1 to -0.1460619, and as it does not contain 0, it suggests that the true difference between the proportions is significantly lower than zero.

Greater

The last possible alternative is that the alternative hypothesis is that the proportion of \(X\) is greater than the proportion of \(Y\):

  • \(H_0\): the proportion of X IS equal to the proportion of \(Y\).
  • \(H_1\): the proportion of X IS GREATER than the proportion of \(Y\).

For this you will need to set alternative = "greater", as in the following example:

# X
p1 <- 50  # Successes
n1 <- 100 # Trials

# Y
p2 <- 150 # Successes
n2 <- 200 # Trials

# Is the proportion of X greater than the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2), alternative = "greater")
	2-sample test for equality of proportions with continuity correction

data:  c(p1, p2) out of c(n1, n2)
X-squared = 17.642, df = 1, p-value = 1
alternative hypothesis: greater
95 percent confidence interval:
 -0.3539381  1.0000000
sample estimates:
prop 1 prop 2 
  0.50   0.75 

In this scenario, the p-value is 1, which implies there is no evidence supporting the claim that the proportion in the first group (\(X\)) is significantly greater than that in the second group (\(Y\)).