The Mann-Whitney U test (also known as Wilcoxon rank-sum test) is a non-parametric statistical test used for comparing two independent groups to determine whether two groups significantly differ from each other.
It is recommended to use the Mann-Whitney U test when data for two independent groups does not follow normal distributions.
The Mann-Whitney U test assumes that observations in each group must be independent, variance of the two groups should be roughly equal, and it is applied on ordinal or continuous data that are not normally distributed.
In R, the Mann-Whitney U test is performed using the
function. Here’s the general syntax which looks like this based on the input data:
# when data is two separate vectors wilcox.test(group1, group2) # when data is single stacked table wilcox.test(response ~ groups, data = df)
response is a variable with outcome values and
groups is a variable which contains the two independent
Note: A Mann-Whitney U test is non-parametric equivalent to an independent two-sample t-test, but it is less powerful (higher Type II error rate) than t-test.
Example of Mann-Whitney U test in R
The following examples explain how to perform the Mann-Whitney U test in R.
Suppose, there are two plant genotypes (A and B) differing in their height. We would like to check whether the heights of two plant genotypes are significantly differ from each other.
Sample size: Mann-Whitney U test can be applied on small (5-20) samples, and the power of the test increases as the sample size increases.
Load the dataset and check the normality of the variables using Shapiro-Wilk normality test,
# import dataset df = read.csv("https://reneshbedre.github.io/assets/posts/mann_whitney/genotype_height.csv") # view first few data head(df) genotype height 1 genotype_A 25 2 genotype_A 30 3 genotype_A 30 4 genotype_A 25 5 genotype_A 25 6 genotype_A 20 # Shapiro-Wilk normality test genotype_A = df[df$genotype == "genotype_A", ]$height genotype_B = df[df$genotype == "genotype_B", ]$height shapiro.test(genotype_A) # output data: x W = 0.88481, p-value = 0.0104 shapiro.test(genotype_B) # output data: genotype_B W = 0.86168, p-value = 0.005501
The p value obtained from Shapiro-Wilk test is lesser than significance level of 0.05 for both the groups. Hence, we conclude that the data for each group are not normally distributed.
Now, perform the Mann-Whitney U test using
wilcox.test(height ~ genotype, data = df) # output data: height by genotype W = 520.5, p-value = 1.414e-08 alternative hypothesis: true location shift is not equal to 0
As the p value from Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, p = 1.414e-08), we can
conclude that there is a significant difference in height between the
Note: By default,
wilcox.test()performs the two-sided test. The one-sided Mann-Whitney U test can be performed by specifying the
For instance, if you want to test whether, the height of
genotype_A is higher than
genotype_B, you can perform the one-sided Mann-Whitney U test
alternative = "greater" argument.
wilcox.test(height ~ genotype, data = df, alternative = "greater") # output data: height by genotype W = 520.5, p-value = 7.068e-09 alternative hypothesis: true location shift is greater than 0
As the p value from one-sided Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, p = 7.068e-09), we can conclude that
the height of
genotype_A (median = 25) is significantly higher than the
genotype_B (median = 11.5).
Enhance your skills with courses on Statistics and R
- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.