# How to Perform Mann-Whitney U test in R

The Mann-Whitney U test (also known as Wilcoxon rank-sum test) is a non-parametric statistical test used for comparing two independent groups to determine whether two groups significantly differ from each other.

It is recommended to use the Mann-Whitney U test when data for two independent groups does not follow normal distributions.

The Mann-Whitney U test assumes that observations in each group must be independent, variance of the two groups should be roughly equal, and it is applied on ordinal or continuous data that are not normally distributed.

In R, the Mann-Whitney U test is performed using the `wilcox.test()`

function. Here’s the general syntax which looks like this based on the input data:

```
# when data is two separate vectors
wilcox.test(group1, group2)
# when data is single stacked table
wilcox.test(response ~ groups, data = df)
```

Where, `response`

is a variable with outcome values and `groups`

is a variable which contains the two independent
groups.

Note: A Mann-Whitney U test is non-parametric equivalent to an independent two-samplet-test, but it is less powerful (higher Type II error rate) thant-test.

## Example of Mann-Whitney U test in R

The following examples explain how to perform the Mann-Whitney U test in R.

Suppose, there are two plant genotypes (A and B) differing in their height. We would like to check whether the heights of two plant genotypes are significantly differ from each other.

Sample size: Mann-Whitney U test can be applied on small (5-20) samples, and the power of the test increases as the sample size increases.

Load the dataset and check the normality of the variables using Shapiro-Wilk normality test,

```
# import dataset
df = read.csv("https://reneshbedre.github.io/assets/posts/mann_whitney/genotype_height.csv")
# view first few data
head(df)
genotype height
1 genotype_A 25
2 genotype_A 30
3 genotype_A 30
4 genotype_A 25
5 genotype_A 25
6 genotype_A 20
# Shapiro-Wilk normality test
genotype_A = df[df$genotype == "genotype_A", ]$height
genotype_B = df[df$genotype == "genotype_B", ]$height
shapiro.test(genotype_A)
# output
data: x
W = 0.88481, p-value = 0.0104
shapiro.test(genotype_B)
# output
data: genotype_B
W = 0.86168, p-value = 0.005501
```

The *p* value obtained from Shapiro-Wilk test is lesser than significance level of 0.05
for both the groups. Hence, we conclude that the data for each group are not normally distributed.

Now, perform the Mann-Whitney U test using `wilcox.test()`

function,

```
wilcox.test(height ~ genotype, data = df)
# output
data: height by genotype
W = 520.5, p-value = 1.414e-08
alternative hypothesis: true location shift is not equal to 0
```

As the *p* value from Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, *p* = 1.414e-08), we can
conclude that there is a significant difference in height between the `genotype_A`

and `genotype_B`

.

Note: By default,`wilcox.test()`

performs the two-sided test. The one-sided Mann-Whitney U test can be performed by specifying the`alternative`

argument to`wilcox.test()`

function.

For instance, if you want to test whether, the height of `genotype_A`

is higher than `genotype_B`

, you can perform the one-sided Mann-Whitney U test
using `alternative = "greater"`

argument.

```
wilcox.test(height ~ genotype, data = df, alternative = "greater")
# output
data: height by genotype
W = 520.5, p-value = 7.068e-09
alternative hypothesis: true location shift is greater than 0
```

As the *p* value from one-sided Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, *p* = 7.068e-09), we can conclude that
the height of `genotype_A`

(median = 25) is significantly higher than the `genotype_B`

(median = 11.5).

The other alternative to Mann-Whitney U test includes Kruskal-Wallis test, Friedman test, and Wilcoxon Signed-Rank Test

## Enhance your skills with courses on Statistics and R

- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization

This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.