# How to Perform Mann-Whitney U test in R

The Mann-Whitney U test (also known as Wilcoxon rank-sum test) is a non-parametric statistical test used for comparing two independent groups to determine whether two groups significantly differ from each other.

It is recommended to use the Mann-Whitney U test when data for two independent groups does not follow normal distributions.

The Mann-Whitney U test assumes that observations in each group must be independent, variance of the two groups should be roughly equal, and it is applied on ordinal or continuous data that are not normally distributed.

In R, the Mann-Whitney U test is performed using the `wilcox.test()` function. Here’s the general syntax which looks like this based on the input data:

``````# when data is two separate vectors
wilcox.test(group1, group2)

# when data is single stacked table
wilcox.test(response ~ groups, data = df)
``````

Where, `response` is a variable with outcome values and `groups` is a variable which contains the two independent groups.

Note: A Mann-Whitney U test is non-parametric equivalent to an independent two-sample t-test, but it is less powerful (higher Type II error rate) than t-test.

## Example of Mann-Whitney U test in R

The following examples explain how to perform the Mann-Whitney U test in R.

Suppose, there are two plant genotypes (A and B) differing in their height. We would like to check whether the heights of two plant genotypes are significantly differ from each other.

Sample size: Mann-Whitney U test can be applied on small (5-20) samples, and the power of the test increases as the sample size increases.

Load the dataset and check the normality of the variables using Shapiro-Wilk normality test,

``````# import dataset

# view first few data
genotype height
1 genotype_A     25
2 genotype_A     30
3 genotype_A     30
4 genotype_A     25
5 genotype_A     25
6 genotype_A     20

# Shapiro-Wilk normality test
genotype_A  = df[df\$genotype == "genotype_A", ]\$height
genotype_B  = df[df\$genotype == "genotype_B", ]\$height

shapiro.test(genotype_A)

# output
data:  x
W = 0.88481, p-value = 0.0104

shapiro.test(genotype_B)

# output
data:  genotype_B
W = 0.86168, p-value = 0.005501
``````

The p value obtained from Shapiro-Wilk test is lesser than significance level of 0.05 for both the groups. Hence, we conclude that the data for each group are not normally distributed.

Now, perform the Mann-Whitney U test using `wilcox.test()` function,

``````wilcox.test(height ~ genotype, data = df)

# output
data:  height by genotype
W = 520.5, p-value = 1.414e-08
alternative hypothesis: true location shift is not equal to 0
``````

As the p value from Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, p = 1.414e-08), we can conclude that there is a significant difference in height between the `genotype_A` and `genotype_B`.

Note: By default, `wilcox.test()` performs the two-sided test. The one-sided Mann-Whitney U test can be performed by specifying the `alternative` argument to `wilcox.test()` function.

For instance, if you want to test whether, the height of `genotype_A` is higher than `genotype_B`, you can perform the one-sided Mann-Whitney U test using `alternative = "greater"` argument.

``````wilcox.test(height ~ genotype, data = df, alternative = "greater")

# output
data:  height by genotype
W = 520.5, p-value = 7.068e-09
alternative hypothesis: true location shift is greater than 0
``````

As the p value from one-sided Mann-Whitney U test is less than significance level of 0.05 (W = 520.5, p = 7.068e-09), we can conclude that the height of `genotype_A` (median = 25) is significantly higher than the `genotype_B` (median = 11.5).

The other alternative to Mann-Whitney U test includes Kruskal-Wallis test, Friedman test, and Wilcoxon Signed-Rank Test