# Shapiro–Wilk Test in R

The normal distribution (also known as the Gaussian distribution) is one of the crucial assumption for performing
parametric tests such as ANOVA, *t*-test, regression, and many others.

The Shapiro-Wilk test is used for assessing whether a dataset follows a normal distribution. This test helps to check the assumption of normality.

The Shapiro-Wilk test analyzes the **null hypothesis** (H0: data comes from a normally distributed population against the
**alternative Hypothesis** (H1: data does not come from a normally distributed population).

To perform the Shapiro-Wilk test of normality in R, you can use the `shapiro.test()`

function.
The syntax for `shapiro.test()`

function looks like this:

```
shapiro.test(data)
```

Where, `data`

is a numeric vector

## Example of Shapiro-Wilk test in R

The following examples demonstrate how to use `shapiro.test()`

function for testing the normality assumption in R.

Suppose, we have the following student weight data and would like to check whether this dataset follows normal distribution.

Sample size: Shapiro-Wilk test can be applied on dataset with sample size in between 3 to 5000.

```
# generate random data for student weights
data = rnorm(50, mean = 70, sd = 10)
# perform Shapiro–Wilk Test
shapiro.test(data)
# output
Shapiro-Wilk normality test
data: data
W = 0.93457, p-value = 0.4943
```

The *p* value is greater than significance level of 0.05 (W = 0.9345, *p* = 0.4943) for student weight data. Hence, we
**fail to reject the null hypothesis** and conclude that student weight data is normally distributed.

In addition to statistical test, you cal also use **histogram** to visually assess whether the data appears to follow a
normal distribution.

Create a histogram,

```
hist(data, main = "Student Weight Distribution", xlab = "Weight (kg)",
ylab = "Frequency", col = "lightblue", border = "black")
```

The shape of the histogram matches the **bell shaped curve** and suggests that the data is normally distributed.

In addition to histogram, the **quantile-quantile (Q-Q) plot** could also be used for more rigorous assessment of normality

```
# load package
library(EnvStats)
# Q-Q plot
qqPlot(data, add.line = TRUE)
```

The data closely follows the reference line on Q-Q plot and suggests that data follows approximate normal distribution.

## Enhance your skills with courses on Statistics and R

- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization

This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.