The Shapiro-Wilk test is used for assessing whether a dataset follows a normal distribution. This test helps to check the assumption of normality.
The Shapiro-Wilk test analyzes the null hypothesis (H0: data comes from a normally distributed population against the alternative Hypothesis (H1: data does not come from a normally distributed population).
To perform the Shapiro-Wilk test of normality in R, you can use the
The syntax for
shapiro.test() function looks like this:
data is a numeric vector
Example of Shapiro-Wilk test in R
The following examples demonstrate how to use
shapiro.test() function for testing the normality assumption in R.
Suppose, we have the following student weight data and would like to check whether this dataset follows normal distribution.
Sample size: Shapiro-Wilk test can be applied on dataset with sample size in between 3 to 5000.
# generate random data for student weights data = rnorm(50, mean = 70, sd = 10) # perform Shapiro–Wilk Test shapiro.test(data) # output Shapiro-Wilk normality test data: data W = 0.93457, p-value = 0.4943
The p value is greater than significance level of 0.05 (W = 0.9345, p = 0.4943) for student weight data. Hence, we fail to reject the null hypothesis and conclude that student weight data is normally distributed.
In addition to statistical test, you cal also use histogram to visually assess whether the data appears to follow a normal distribution.
Create a histogram,
hist(data, main = "Student Weight Distribution", xlab = "Weight (kg)", ylab = "Frequency", col = "lightblue", border = "black")
The shape of the histogram matches the bell shaped curve and suggests that the data is normally distributed.
In addition to histogram, the quantile-quantile (Q-Q) plot could also be used for more rigorous assessment of normality
# load package library(EnvStats) # Q-Q plot qqPlot(data, add.line = TRUE)
The data closely follows the reference line on Q-Q plot and suggests that data follows approximate normal distribution.
Enhance your skills with courses on Statistics and R
- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.