Shapiro–Wilk Test in R

Renesh Bedre    2 minute read

The normal distribution (also known as the Gaussian distribution) is one of the crucial assumption for performing parametric tests such as ANOVA, t-test, regression, and many others.

The Shapiro-Wilk test is used for assessing whether a dataset follows a normal distribution. This test helps to check the assumption of normality.

The Shapiro-Wilk test analyzes the null hypothesis (H0: data comes from a normally distributed population against the alternative Hypothesis (H1: data does not come from a normally distributed population).

To perform the Shapiro-Wilk test of normality in R, you can use the shapiro.test() function. The syntax for shapiro.test() function looks like this:


Where, data is a numeric vector

Example of Shapiro-Wilk test in R

The following examples demonstrate how to use shapiro.test() function for testing the normality assumption in R.

Suppose, we have the following student weight data and would like to check whether this dataset follows normal distribution.

Sample size: Shapiro-Wilk test can be applied on dataset with sample size in between 3 to 5000.

# generate random data for student weights
data = rnorm(50, mean = 70, sd = 10)

# perform Shapiro–Wilk Test

# output
	Shapiro-Wilk normality test

data:  data
W = 0.93457, p-value = 0.4943

The p value is greater than significance level of 0.05 (W = 0.9345, p = 0.4943) for student weight data. Hence, we fail to reject the null hypothesis and conclude that student weight data is normally distributed.

In addition to statistical test, you cal also use histogram to visually assess whether the data appears to follow a normal distribution.

Create a histogram,

hist(data, main = "Student Weight Distribution", xlab = "Weight (kg)", 
     ylab = "Frequency", col = "lightblue", border = "black")

Histogram to check normality

The shape of the histogram matches the bell shaped curve and suggests that the data is normally distributed.

In addition to histogram, the quantile-quantile (Q-Q) plot could also be used for more rigorous assessment of normality

# load package

# Q-Q plot
qqPlot(data, add.line = TRUE)

Q-Q plot to check normality

The data closely follows the reference line on Q-Q plot and suggests that data follows approximate normal distribution.

Enhance your skills with courses on Statistics and R

This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.