Shapiro Wilk normality test in R

Statistics with R Hypothesis testing

The Shapiro Wilk test is a test used to check the normality of the data, specially for small data sets with 50 samples or less. In R, the shapiro.test function performs this test for a numeric vector of values.

Hypothesis

The Shapiro-Wilk test checks for normality of the data. The null hypothesis ($$H_0$$) is that the distribution of the population is normal, while the alternative hypothesis is that the distribution of the population is not normal:

• $$H_0$$: The distribution of the population is normal.
• $$H_1$$: The distribution of the population is NOT normal.

The Shapiro-Wilk test is recommended for small data sets, when the sample size is 50 or less. For larger data sets it is recommended to use the Kolmogorov-Smirnov test.

Check for normality and interpretation

Example with non-normal data

Consider a sample data set with 30 values drawn from an exponential distribution:

# Sample data
set.seed(3)
x <- rexp(30)

# Histogram and density
hist(x, freq = FALSE, col = "white")
lines(density(x), lwd = 2, col = "red")

If you want to check if your data is normal you can input your data to the shapiro.test function, as shown below.

# Sample data
set.seed(3)
x <- rexp(30)

# Shapiro-Wilk test
shapiro.test(x)
	Shapiro-Wilk normality test

data:  x
W = 0.83304, p-value = 0.0002792

The function will return a formatted list of class "htest" with the name of your data, the value of the Shapiro-Wilk statistic (W) and the p-value.

In order to interpret the result you need to compare the p-value with a significance level ($$\alpha$$) which is a threshold to determine if a test is statistically significant or not. Significance levels usually are 0.1 (for 90% of confidence), 0.05 (for 95% of confidence), 0.01 (for 99% percent of confidence).

As the p-value (0.000279) is lower than the usual significance levels (0.1, 0.05, and 0.01) we reject the null hypothesis of normality.

If the p-value is lower than $$\alpha$$ we reject $$H_0$$ for the selected level of significance.

Example with normal data

Now, consider some sample data drawn from a normal distribution:

# Sample data
set.seed(5)
x <- rnorm(30)

# Histogram and density
hist(x, freq = FALSE, col = "white")
lines(density(x), lwd = 2, col = "red")

If you apply the Shapiro-Wilk test for this data you will get the following output:

# Sample data
set.seed(5)
x <- rnorm(30)

# Shapiro-Wilk test
shapiro.test(x)
	Shapiro-Wilk normality test

data:  x
W = 0.95084, p-value = 0.178

The p-value is 0.178, greater than the usual significance levels (0.1, 0.05, and 0.01), so we donâ€™t have enough evidence to reject the null hypothesis of normality.