T-test in R to compare means

Statistics with R Hypothesis testing
T-test in R

The t.test function in R is used to perform a t-test, which is a statistical test to compare the means of two groups and determine if they are significantly different from each other or to test if the mean of a sample is equal to a certain value. The function allows you to conduct various types of t-tests, such as one-sample t-test, independent samples t-test and paired samples t-test, for equal or different variances.

Syntax

The syntax of the t.test function is the following:

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

# Method for class 'formula'
t.test(formula, data, subset, na.action, ...)

Being:

  • x: A numeric vector of data values for the first group.
  • y: Optional. a numeric vector of data values for the second group. If omitted, a one-sample t-test is performed on the values in x.
  • alternative: a character string specifying the alternative hypothesis. It can be one of "two.sided" (default), "less", or "greater".
  • mu: The hypothesized mean difference. Use this for one-sample t-tests when comparing the mean of a single group against a known value. Default is 0.
  • paired: Logical indicating whether the two samples are paired. The default is FALSE.
  • var.equal: Logical indicating whether to assume equal variances for an independent two-sample t-test. Default is FALSE.
  • conf.level: Confidence level for the confidence interval. Default is 0.95.

One sample t-test

The one sample t-test can be used to check if the true mean of a simple random sample drawn from a normal population with unknown mean \(\mu\) is equal to \(\mu_0\), greater than \(\mu_0\) or lower than \(\mu_0\).

The t-test assumes that the observations are independent and drawn from a normal distribution. The minimum recommended sample size is 30.

Mean equal to \(\mu_0\)

The null and alternative hypotheses are the following:

  • \(H_0\): The mean of the distribution IS \(\mu_0\).
  • \(H_1\): The mean of the distribution is NOT \(\mu_0\).

Given a sample data, you can determine whether its mean is equal to \(\mu_0\) or not using the t.test function. The following example tests whether the true mean is equal to 10 or not for a 95% confidence level.

# Sample data
set.seed(10)
x <- rnorm(100, mean = 10)

# Is the mean of 'x' different from 10?
t.test(x = x, mu = 10, conf.level = 0.95)
	One Sample t-test

data:  x
t = -1.4507, df = 99, p-value = 0.15
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
  9.676689 10.050213
sample estimates:
mean of x 
 9.863451 

The p-value is greater than the usual significance levels, so we don’t have enough evidence to reject the null hypothesis that the true mean is equal to 10. Notice that \(\mu_0\) is inside the 95% confidence interval returned by the function.

Mean lower than \(\mu_0\)

In this scenario the null and alternative hypotheses are the following:

  • \(H_0\): The mean of the distribution IS \(\mu_0\).
  • \(H_1\): The mean of the distribution is LOWER than \(\mu_0\).

The next example checks whether there is enough evidence to reject the null hypothesis or not. As the alternative hypothesis is that the mean of the distribution is lower than \(\mu_0\) we have to set alternative = "less".

# Sample data
set.seed(10)
x <- rnorm(100, mean = 8)

# Is the mean of 'x' less than 10?
t.test(x = x, mu = 10, alternative = "less")
	One Sample t-test

data:  x
t = -22.699, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is less than 10
95 percent confidence interval:
     -Inf 8.019733
sample estimates:
mean of x 
 7.863451 

The p-value indicates strong evidence against the null hypothesis. This implies that the null hypothesis (true mean is 10) can be rejected in favor of the alternative hypothesis (true mean is less than 10). The 95% confidence interval also supports this, as it ranges from (\(\infty\), 8.019733), so the true mean is likely less than 8.019733.

Mean greater than \(\mu_0\)

The last option involves conducting a test where the null hypothesis assumes the true mean to be \(\mu_0\), while the alternative hypothesis considers the true mean greater than \(\mu_0\):

  • \(H_0\): The mean of the distribution IS \(\mu_0\).
  • \(H_1\): The mean of the distribution is GREATER than \(\mu_0\).
# Sample data
set.seed(10)
x <- rnorm(100, mean = 8)

# Is the mean of 'x' greater than 10?
t.test(x = x, mu = 10, alternative = "greater")
	One Sample t-test

data:  x
t = -22.699, df = 99, p-value = 1
alternative hypothesis: true mean is greater than 10
95 percent confidence interval:
 7.707169      Inf
sample estimates:
mean of x 
 7.863451 

In this case, a p-value of 1 implies that there is no significant evidence against the null hypothesis that the true mean is 10.

Two sample t-test

The t.test function can also perform a two sample t-test to compare the means between two groups. To conduct this test, assign one group to x and the other to y inside the function. Note that by default both groups are considered independent and with different variances.

If the population variances are assumed to be different (the default), this test is also called a Welch test or Welch’s t-test.

Equal means

The null hypothesis for a test of equal means states that the means of the populations are equal, while the alternative hypothesis contends that the means differ between the populations:

  • \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y. (Or the means difference is 0.)
  • \(H_1\): The mean of the distribution of X is DIFFERENT to the mean of the distribution of Y. (Or the means difference is not 0.)
# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)

# Is mean of 'x' different from the mean of 'y'?
t.test(x = x, y = y)
	Welch Two Sample t-test

data:  x and y
t = -0.30777, df = 197.83, p-value = 0.7586
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3080508  0.2248780
sample estimates:
  mean of x   mean of y 
-0.13654894 -0.09496258 

The p-value is greater than the usual significance levels, which implies that there is no enough evidence to reject the null hypothesis of equal means.

Lower mean

In this scenario the alternative hypothesis is that the true mean of the first group is lower than the true mean of the second group:

  • \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
  • \(H_1\): The mean of the distribution of X is LOWER than the mean of the distribution of Y.
# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)

# Is mean of 'x' less than mean of 'y'?
t.test(x = x, y = y, alternative = "less")
	Welch Two Sample t-test

data:  x and y
t = -0.30777, df = 197.83, p-value = 0.3793
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf 0.1817153
sample estimates:
  mean of x   mean of y 
-0.13654894 -0.09496258 

The p-value is greater than the usual significance levels, so there is no evidence to reject the null hypothesis that the mean of X is equal to the mean of Y.

Greater mean

  • \(H_0\): The mean of the distribution of X is EQUAL to the mean of the distribution of Y.
  • \(H_1\): The mean of the distribution of X is GREATER than the mean of the distribution of Y.
# Sample data
set.seed(10)
x <- rnorm(100, mean = 3)
y <- rnorm(100)

# Is mean of 'x' greater than mean of 'y'?
t.test(x = x, y = y, alternative = "greater")
	Welch Two Sample t-test

data:  x and y
t = 21.894, df = 197.83, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 2.735112      Inf
sample estimates:
  mean of x   mean of y 
 2.86345106 -0.09496258 

In this case, the p-value is almost 0, which implies that there is strong evidence to reject the null hypothesis of equal means.

Equal variances

By default, t.test assumes different population variances. However, if an F-test (e.g., conducted with the var.test function) does not provide sufficient evidence to reject the null hypothesis of equal variances, you can set var.equal = TRUE. This setting enables the use of a pooled variance estimate for the calculation.

# Sample data
set.seed(10)
x <- rnorm(100)
y <- rnorm(100)

# Independent samples t-test with equal population variances
t.test(x = x, y = y, var.equal = TRUE)
	Two Sample t-test

data:  x and y
t = -0.30777, df = 198, p-value = 0.7586
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3080493  0.2248766
sample estimates:
  mean of x   mean of y 
-0.13654894 -0.09496258 

The p-value is greater than the usual significance levels, which imply there is no enough evidence to reject the null hypothesis of equal means.

Paired t-test

Lastly, if the groups are dependent, you should specify paired = TRUE to execute a paired samples t-test.

# Sample data
set.seed(10)
x <- rnorm(100)
x_2 <- sqrt(x)

# Paired samples t-test
t.test(x = x, y = x_2, paired = TRUE)
	Paired t-test

data:  x and x_2
t = -2.152, df = 43, p-value = 0.03705
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.136882092 -0.004443087
sample estimates:
mean difference 
    -0.07066259 

In this test the p-value is 0.03705, so there is no enough evidence to reject the null hypothesis of equal means for 0.05 and 0.1, but it can be rejected for 0.01.