MannWhitney U test (Wilcoxon rank sum test) in Python
MannWhitney U test

MannWhitney U test is a nonparametric (distribution free) alternative to the two sample ttest. It is first proposed by Frank Wilcoxon (1945) and later worked by Henry Mann and Donald Whitney (1947). Hence, the MannWhitney U test is also known as Wilcoxon rank sum test or Wilcoxon‐Mann‐Whitney (WMW) test.
Note: Wilcoxon rank sum test is different than Wilcoxon signed rank sum test. On paired data, the Wilcoxon signed rank sum test is used.
 MannWhitney U test does not assume any specific distribution (such as normal distribution of samples) for calculating test statistics and p values.
 The sample medians are compared in the MannWhitney U test, which distinguishes it from the ttest, which compares sample means.
 MannWhitney U test can be applied on small number of samples (520).
 Though MannWhitney U test and ttest has similar statistical power, it is always wise to use ttest if its assumptions are met.
MannWhitney U test assumptions
 The observations from the two groups should be randomly selected from the target populations
 Observations are independent of each other
 Observations should be continous or ordinal (e.g. Likert item data)
MannWhitney U test Hypotheses
If we have two groups with observations x_{1}, x_{2}, …, x_{m} and y_{1}, y_{2}, …, y_{n} sampled from X and Y populations, then MannWhitney U test compares each observation x_{i} from sample x with each observation (y_{j}) from sample y.
Null hypothesis: p (x_{i} > y_{j} ) = 0.5
Alternative hypothesis: p (x_{i} > y_{j} ) ≠ 0.5
Above twosided alternative hypothesis tests that there is equal probability of x_{i} is greater or lesser than y_{j} (both groups came from same population),
Onesided alternative hypothesis tests probability of x_{i} is greater than y_{j} and vice versa.
We can also state the twosided hypothesis in terms of median as
Null hypothesis: Two groups have equal median
Alternative hypothesis: Two groups does not have equal median
Onesided alternative hypothesis tests median from one group can be greater or lesser than other group.
Learn more about hypothesis testing and interpretation
MannWhitney U Test formula
The p value is calculated based on the comparison between the critical value and the U value. If U value <= critical value, we reject the null hypothesis and vice versa.
Perform MannWhitney U test in Python
Get dataset
Load hypothetical plant genotypes (A and B) yield dataset,
Learn how to import data using pandas
import pandas as pd
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/mann_whitney/genotype.csv")
df.head(2)
A B
0 60 10
1 30 25
# generate boxplot to check data spread
import matplotlib.pyplot as plt
df.boxplot(column=['A', 'B'], grid=False)
plt.show()
Check data distribution
Check data distribution using ShapiroWilk test and histogram ,
import scipy.stats as stats
w, pvalue = stats.shapiro(df['A'])
w, pvalue
(0.8239281177520752, 0.0009495539125055075)
w, pvalue = stats.shapiro(df['B'])
w, pvalue
(0.7946348190307617, 0.00031481595942750573)
# plot histogram
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Frequency histogram of genotypes yield')
ax1.hist(df['A'], bins=10, histtype='bar', ec='k')
ax2.hist(df['B'], bins=10, histtype='bar', ec='k')
ax1.set_xlabel("Yield")
ax2.set_xlabel("Yield")
plt.show()
As the p value obtained from the ShapiroWilk test is significant (p < 0.05), we conclude that the data is not normally distributed. Further, in histogram data distribution shape does not look normal. Therefore, MannWhitney U test is more appropriate for analyzing two samples.
Perform MannWhitney U test
Perform twosided (yield of two genotypes does not have equal medians) MannWhitney U test,
import scipy.stats as stats
# perform twosided test. You can use 'greater' or 'less' for onesided test
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'twosided')
# output
MannwhitneyuResult(statistic=489.5, pvalue=7.004695394561267e07)
Note: p value obtained from mannwhitneyu is based on the normal approximation and not exact. Set use_continuity=False for exact p value. Normal approximation is useful when sample size is large. Both exact and normal approximation p value should be roughly similar.
As the p value obtained from the MannWhitney U test is significant (p < 0.05), we conclude that the yield of the two genotypes significantly different from each other .
Perform onesided (median yield of A genotype is greater than median yield of genotype B) MannWhitney U test,
import scipy.stats as stats
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'greater')
# output
MannwhitneyuResult(statistic=489.5, pvalue=3.5023476972806333e07)
As the p value obtained from the MannWhitney U test is significant (p < 0.05), we conclude that the yield of the A genotype significantly greater than the genotype B.
References
 Nachar N. The MannWhitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology. 2008 Mar;4(1):1320.
 Mann–Whitney Test
 Mann Whitney U Test (Wilcoxon Rank Sum Test)
This work is licensed under a Creative Commons Attribution 4.0 International License