Test for proportions with prop.test()
The prop.test
function in R is used for testing the null hypothesis that the proportions of two independent random variables \(X\) and \(Y\) are equal (two sample proportion test) or for examining a single proportion against a hypothesized value (one sample proportion test).
Syntax
The syntax of the prop.test
function is the following:
prop.test(x, n, p = NULL,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95, correct = TRUE)
Being:
x
: A numeric vector or a two-column matrix. For the vector case, it represents the number of successes; for the matrix case, the first column indicates the number of successes and the second column the number of failures.n
: The number of trials for each proportion.p
: A vector of probabilities or a single probability value under the null hypothesis. If not givenp = 0.5
.alternative
: Specifies the alternative hypothesis. Possible values are:"two.sided"
,"less"
, or"greater"
.conf.level
: Confidence level for the returned confidence interval. Defaults to 0.95.correct
: A logical value indicating whether to apply Yates continuity correction. Defaults toTRUE
.
The function returns the X-squared statistic, the degrees of freedom, the p-value, the alternative hypothesis, the confidence interval and the sample estimate of the proportion.
One proportion Z test
The one-sample proportion test compares a sample proportion to a known population proportion or a hypothesized proportion.
If the sample size is small (n < 30) use binom.test
instead of prop.test
to calculate an exact test.
By default, the prop.test
function applies the Yates continuity correction where possible. It is applied to the Chi-squared test when the expected frequencies are less than 5 in 2x2 tables. If you don’t want to apply this correction set correct = FALSE
.
Equal to a proportion
Consider the following null and alternative hypothesis:
- \(H_0\): the proportion of X IS \(p\).
- \(H_1\): the proportion of X IS NOT \(p\).
To perform this test you will need to input the number of successes to x
, the number of trials to n
and the hypothesized proportion to p
. The default confidence level for the confidence interval is 95%.
# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion equal to 0.6 for a 95% confidence level?
prop.test(x = 42, n = 107, p = 0.6, conf.level = 0.95)
# Equivalent to input a two-column matrix with the number of successes (42) and failures (65)
# prop.test(x = matrix(c(42, 65), ncol = 2), p = 0.6, conf.level = 0.95)
1-sample proportions test with continuity correction
data: 42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 1.851e-05
alternative hypothesis: true p is not equal to 0.6
95 percent confidence interval:
0.3009435 0.4919223
sample estimates:
p
0.3925234
The obtained p-value (1.851e-05) is significantly smaller than the usual significance levels, indicating strong evidence against the null hypothesis that the true proportion is equal to 0.6. Therefore, we can conclude that the true proportion of success is significantly different from 0.6. In addition, the upper limit of the confidence interval (0.4919223) is lower than the hypothesized proportion (0.6).
Lower than a proportion
You can also perform a test where the alternative hypothesis is that the proportion is lower than a specific value:
- \(H_0\): the proportion of X IS \(p\).
- \(H_1\): the proportion of X LOWER than \(p\).
# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion lower than 0.6?
prop.test(x = 42, n = 107, p = 0.6, alternative = "less")
1-sample proportions test with continuity correction
data: 42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 9.255e-06
alternative hypothesis: true p is less than 0.6
95 percent confidence interval:
0.0000000 0.4766163
sample estimates:
p
0.3925234
The p-value is almost zero, which implies that there is strong evidence against the null hypothesis. Consequently, we would reject the null hypothesis in favor of the alternative, concluding that the true proportion of success is significantly less than 0.6. Moreover, it’s important to note that the upper limit of the confidence interval (0.4766163) falls below the null hypothesis proportion of 0.6.
Greater than a proportion
The final option for a one-sample proportion test is to examine whether the true proportion is greater than the specified value for the alternative hypothesis.
- \(H_0\): the proportion of X IS \(p\).
- \(H_1\): the proportion of X GREATER than \(p\).
# Hypothesis test for a single proportion
# 107 trials, 42 successes. Is the proportion greater than 0.6?
prop.test(x = 42, n = 107, p = 0.6, alternative = "greater")
1-sample proportions test with continuity correction
data: 42 out of 107, null probability 0.6
X-squared = 18.337, df = 1, p-value = 1
alternative hypothesis: true p is greater than 0.6
95 percent confidence interval:
0.3140465 1.0000000
sample estimates:
p
0.3925234
The test gives a p-value of 1, indicating there is no evidence to reject the null hypothesis that the true proportion is \(p\).
Two proportions Z test (difference of proportions)
The two-sample proportion test compares proportions between two independent groups. It assesses whether the proportions in these groups significantly differ from each other.
Equal proportions
Consider the following null and alternative hypothesis:
- \(H_0\): the proportion of X IS equal to the proportion of \(Y\). (Or the difference of proportions is 0)
- \(H_1\): the proportion of X IS NOT equal to the proportion of \(Y\).
You can perform a two sample proportion test as follows:
# X
p1 <- 50 # Successes
n1 <- 100 # Trials
# Y
p2 <- 80 # Successes
n2 <- 200 # Trials
# Is the proportion of X equal to the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2))
2-sample test for equality of proportions with continuity correction
data: c(p1, p2) out of c(n1, n2)
X-squared = 2.323, df = 1, p-value = 0.1275
alternative hypothesis: two.sided
95 percent confidence interval:
-0.02671995 0.22671995
sample estimates:
prop 1 prop 2
0.5 0.4
The p-value is greater than the usual significance levels, so we don’t have enough statistical evidence to reject the null hypothesis, this is, there is no evidence to suggest different proportions between groups.
Lower
You can also perform a test where the alternative hypothesis is that the proportion of \(X\) is less than the proportion of \(Y\), this is:
- \(H_0\): the proportion of X IS equal to the proportion of \(Y\).
- \(H_1\): the proportion of X IS LOWER than the proportion of \(Y\).
To perform this test you will have to specify alternative = "less"
, as shown below:
# X
p1 <- 50 # Successes
n1 <- 100 # Trials
# Y
p2 <- 150 # Successes
n2 <- 200 # Trials
# Is the proportion of X lower than the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2), alternative = "less")
2-sample test for equality of proportions with continuity correction
data: c(p1, p2) out of c(n1, n2)
X-squared = 17.642, df = 1, p-value = 1.333e-05
alternative hypothesis: less
95 percent confidence interval:
-1.0000000 -0.1460619
sample estimates:
prop 1 prop 2
0.50 0.75
The p-value is 1.333e-05, near zero, which implies that there is statistical evidence to reject the null hypothesis and support the alternative hypothesis that the proportion of \(X\) is lower than the proportion of \(Y\). In addition, the 95 percent confidence interval ranges from -1 to -0.1460619, and as it does not contain 0, it suggests that the true difference between the proportions is significantly lower than zero.
Greater
The last possible alternative is that the alternative hypothesis is that the proportion of \(X\) is greater than the proportion of \(Y\):
- \(H_0\): the proportion of X IS equal to the proportion of \(Y\).
- \(H_1\): the proportion of X IS GREATER than the proportion of \(Y\).
For this you will need to set alternative = "greater"
, as in the following example:
# X
p1 <- 50 # Successes
n1 <- 100 # Trials
# Y
p2 <- 150 # Successes
n2 <- 200 # Trials
# Is the proportion of X greater than the proportion of Y?
prop.test(c(p1, p2), n = c(n1, n2), alternative = "greater")
2-sample test for equality of proportions with continuity correction
data: c(p1, p2) out of c(n1, n2)
X-squared = 17.642, df = 1, p-value = 1
alternative hypothesis: greater
95 percent confidence interval:
-0.3539381 1.0000000
sample estimates:
prop 1 prop 2
0.50 0.75
In this scenario, the p-value is 1, which implies there is no evidence supporting the claim that the proportion in the first group (\(X\)) is significantly greater than that in the second group (\(Y\)).