T-test in R to compare means
The t.test
function in R is used to perform a t-test, which is a statistical test to compare the means of two groups and determine if they are significantly different from each other or to test if the mean of a sample is equal to a certain value. The function allows you to conduct various types of t-tests, such as one-sample t-test, independent samples t-test and paired samples t-test, for equal or different variances.
Syntax
The syntax of the t.test
function is the following:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
# Method for class 'formula'
t.test(formula, data, subset, na.action, ...)
Being:
x
: A numeric vector of data values for the first group.y
: Optional. a numeric vector of data values for the second group. If omitted, a one-sample t-test is performed on the values inx
.alternative
: a character string specifying the alternative hypothesis. It can be one of"two.sided"
(default),"less"
, or"greater"
.mu
: The hypothesized mean difference. Use this for one-sample t-tests when comparing the mean of a single group against a known value. Default is0
.paired
: Logical indicating whether the two samples are paired. The default isFALSE
.var.equal
: Logical indicating whether to assume equal variances for an independent two-sample t-test. Default isFALSE
.conf.level
: Confidence level for the confidence interval. Default is 0.95.
One sample t-test
The one sample t-test can be used to check if the true mean of a simple random sample drawn from a normal population with unknown mean \(\mu\) is equal to \(\mu_0\), greater than \(\mu_0\) or lower than \(\mu_0\).
The t-test assumes that the observations are independent and drawn from a normal distribution. The minimum recommended sample size is 30.
Mean equal to \(\mu_0\)
The null and alternative hypotheses are the following:
- \(H_0\): The mean of the distribution IS \(\mu_0\).
- \(H_1\): The mean of the distribution is NOT \(\mu_0\).
Given a sample data, you can determine whether its mean is equal to \(\mu_0\) or not using the t.test
function. The following example tests whether the true mean is equal to 10 or not for a 95% confidence level.
# Sample data
set.seed(10)
x <- rnorm(100, mean = 10)
# Is the mean of 'x' different from 10?
t.test(x = x, mu = 10, conf.level = 0.95)
One Sample t-test
data: x
t = -1.4507, df = 99, p-value = 0.15
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
9.676689 10.050213
sample estimates:
mean of x
9.863451
The p-value is greater than the usual significance levels, so we don’t have enough evidence to reject the null hypothesis that the true mean is equal to 10. Notice that \(\mu_0\) is inside the 95% confidence interval returned by the function.
Mean lower than \(\mu_0\)
In this scenario the null and alternative hypotheses are the following:
- \(H_0\): The mean of the distribution IS \(\mu_0\).
- \(H_1\): The mean of the distribution is LOWER than \(\mu_0\).
The next example checks whether there is enough evidence to reject the null hypothesis or not. As the alternative hypothesis is that the mean of the distribution is lower than \(\mu_0\) we have to set alternative = "less"
.
# Sample data
set.seed(10)
x <- rnorm(100, mean = 8)
# Is the mean of 'x' less than 10?
t.test(x = x, mu = 10, alternative = "less")
One Sample t-test
data: x
t = -22.699, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is less than 10
95 percent confidence interval:
-Inf 8.019733
sample estimates:
mean of x
7.863451
The p-value indicates strong evidence against the null hypothesis. This implies that the null hypothesis (true mean is 10) can be rejected in favor of the alternative hypothesis (true mean is less than 10). The 95% confidence interval also supports this, as it ranges from (\(\infty\), 8.019733), so the true mean is likely less than 8.019733.
Mean greater than \(\mu_0\)
The last option involves conducting a test where the null hypothesis assumes the true mean to be \(\mu_0\), while the alternative hypothesis considers the true mean greater than \(\mu_0\):
- \(H_0\): The mean of the distribution IS \(\mu_0\).
- \(H_1\): The mean of the distribution is GREATER than \(\mu_0\).
# Sample data
set.seed(10)
x <- rnorm(100, mean = 8)
# Is the mean of 'x' greater than 10?
t.test(x = x, mu = 10, alternative = "greater")
One Sample t-test
data: x
t = -22.699, df = 99, p-value = 1
alternative hypothesis: true mean is greater than 10
95 percent confidence interval:
7.707169 Inf
sample estimates:
mean of x
7.863451
In this case, a p-value of 1 implies that there is no significant evidence against the null hypothesis that the true mean is 10.
Two sample t-test
The t.test
function can also perform a two sample t-test to compare the means between two groups. To conduct this test, assign one group to x
and the other to y
inside the function. Note that by default both groups are considered independent and with different variances.
If the population variances are assumed to be different (the default), this test is also called a Welch test or Welch’s t-test.
Equal means
The null hypothesis for a test of equal means states that the means of the populations are equal, while the alternative hypothesis contends that the means differ between the populations:
- \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y. (Or the means difference is 0.)
- \(H_1\): The mean of the distribution of X is DIFFERENT to the mean of the distribution of Y. (Or the means difference is not 0.)
# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)
# Is mean of 'x' different from the mean of 'y'?
t.test(x = x, y = y)
Welch Two Sample t-test
data: x and y
t = -0.30777, df = 197.83, p-value = 0.7586
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3080508 0.2248780
sample estimates:
mean of x mean of y
-0.13654894 -0.09496258
The p-value is greater than the usual significance levels, which implies that there is no enough evidence to reject the null hypothesis of equal means.
Lower mean
In this scenario the alternative hypothesis is that the true mean of the first group is lower than the true mean of the second group:
- \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
- \(H_1\): The mean of the distribution of X is LOWER than the mean of the distribution of Y.
# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)
# Is mean of 'x' less than mean of 'y'?
t.test(x = x, y = y, alternative = "less")
Welch Two Sample t-test
data: x and y
t = -0.30777, df = 197.83, p-value = 0.3793
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 0.1817153
sample estimates:
mean of x mean of y
-0.13654894 -0.09496258
The p-value is greater than the usual significance levels, so there is no evidence to reject the null hypothesis that the mean of X is equal to the mean of Y.
Greater mean
- \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
- \(H_1\): The mean of the distribution of X is GREATER than the mean of the distribution of Y.
# Sample data
set.seed(10)
x <- rnorm(100, mean = 3)
y <- rnorm(100)
# Is mean of 'x' greater than mean of 'y'?
t.test(x = x, y = y, alternative = "greater")
Welch Two Sample t-test
data: x and y
t = 21.894, df = 197.83, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
2.735112 Inf
sample estimates:
mean of x mean of y
2.86345106 -0.09496258
In this case, the p-value is almost 0, which implies that there is strong evidence to reject the null hypothesis of equal means.
Equal variances
By default, t.test
assumes different population variances. However, if an F-test (e.g., conducted with the var.test function) does not provide sufficient evidence to reject the null hypothesis of equal variances, you can set var.equal = TRUE
. This setting enables the use of a pooled variance estimate for the calculation.
# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)
# Independent samples t-test with equal population variances
t.test(x = x, y = y, var.equal = TRUE)
Two Sample t-test
data: x and y
t = -0.30777, df = 198, p-value = 0.7586
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3080493 0.2248766
sample estimates:
mean of x mean of y
-0.13654894 -0.09496258
The p-value is greater than the usual significance levels, which imply there is no enough evidence to reject the null hypothesis of equal means.
Paired t-test
Lastly, if the groups are dependent, you should specify paired = TRUE
to execute a paired samples t-test.
# Sample data
set.seed(10)
x <- rnorm(100)
x_2 <- sqrt(x)
# Paired samples t-test
t.test(x = x, y = x_2, paired = TRUE)
Paired t-test
data: x and x_2
t = -2.152, df = 43, p-value = 0.03705
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.136882092 -0.004443087
sample estimates:
mean difference
-0.07066259
In this test the p-value is 0.03705, so there is no enough evidence to reject the null hypothesis of equal means for 0.05 and 0.1, but it can be rejected for 0.01.