Mann-Whitney U test (Wilcoxon rank sum test) in Python

Mann-Whitney U test

• Mann-Whitney U test is a non-parametric (distribution free) alternative to the parametric two sample t-test. It is first proposed by Frank Wilcoxon (1945) and later worked by Henry Mann and Donald Whitney (1947). Hence, the Mann-Whitney U test is also known as Wilcoxon rank sum test or Wilcoxon‐Mann‐Whitney (WMW) test.

Wilcoxon rank sum test is different than Wilcoxon signed rank sum test. On paired data, the Wilcoxon signed rank sum test is used.

• Mann-Whitney U test used for comparing differences between two independent groups. It tests the hypothesis that if the two groups come from same population or have the same medians. It does not assume any specific distribution (such as normal distribution of samples) for calculating test statistics and p values. If there are more than two groups to analyze, you should consider Kruskal-Wallis test.
• The sample mean ranks or medians (not means) are compared in the Mann-Whitney U test based on the shape of distribution of two independent groups, which distinguishes it from the t-test, which compares sample means.
• Mann-Whitney U test can be applied on small (5-20) and large samples. The power increases with sample size.
• Though Mann-Whitney U test and t-test has similar statistical power, it is always wise to use t-test if its assumptions are met.

Mann-Whitney U test Hypotheses

If we have two independent groups with observations x1, x2, …, xm and y1, y2, …, yn sampled from X and Y populations, then Mann-Whitney U test compares each observation xi from sample x with each observation (yj) from sample y.

Null hypothesis: p (xi > yj ) = 0.5
Alternative hypothesis: p (xi > yj ) ≠ 0.5

Above two-sided alternative hypothesis tests that there is equal probability of xi is greater or lesser than yj (both groups came from same population),

One-sided alternative hypothesis tests probability of xi is greater than yj and vice versa.

We can also state the two-sided hypothesis in terms of median as (when two groups have same shape of distribution)

Null hypothesis: Two groups have equal median

Alternative hypothesis: Two groups does not have equal median

One-sided alternative hypothesis tests median from one group can be greater or lesser than other group.

Mann-Whitney U Test formula

The p value is calculated based on the comparison between the critical value and the U value. If U value <= critical value, we reject the null hypothesis and vice versa.

How Mann-Whitney U Test works?

• Merge the data from two samples and rank them from smallest to largest
• Calculate the sum of rank for each sample (Rx and Ry)
• Calculate Mann-Whitney test statistic (U) using the formula (minimum of Ux and Uy)
• Calculate p value by comparing U with the critical value

Perform Mann-Whitney U test in Python

Get dataset

Load hypothetical plant genotypes (A and B) yield dataset,

Learn how to import data using pandas

import pandas as pd
A   B
0  60  10
1  30  25

# generate boxplot to check data spread
import matplotlib.pyplot as plt
df.boxplot(column=['A', 'B'], grid=False)
plt.show()


Check data distribution

Check data distribution using Shapiro-Wilk test and histogram ,

import scipy.stats as stats
w, pvalue = stats.shapiro(df['A'])
w, pvalue
(0.8239281177520752, 0.0009495539125055075)

w, pvalue = stats.shapiro(df['B'])
w, pvalue
(0.7946348190307617, 0.00031481595942750573)

# plot histogram
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Frequency histogram of genotypes yield')
ax1.hist(df['A'], bins=10, histtype='bar', ec='k')
ax2.hist(df['B'], bins=10, histtype='bar', ec='k')
ax1.set_xlabel("Yield")
ax2.set_xlabel("Yield")
plt.show()


As the p value obtained from the Shapiro-Wilk test is significant (p < 0.05), we conclude that the data is not normally distributed. Further, in histogram data distribution shape does not look normal. Therefore, Mann-Whitney U test is more appropriate for analyzing two samples.

Perform Mann-Whitney U test

Perform two-sided (yield of two genotypes does not have equal medians) Mann-Whitney U test,

Note: We are comparing median here as two genotypes have similar shape of distribution (see histogram and boxplot). If two groups do not have similar shape of distribution, you should compare mean ranks.

import scipy.stats as stats
# perform two-sided test. You can use 'greater' or 'less' for one-sided test
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'two-sided')
# output
MannwhitneyuResult(statistic=489.5, pvalue=7.004695394561267e-07)


Note: p value obtained from mannwhitneyu is based on the normal approximation and not exact. Set use_continuity=False for exact p value. Normal approximation is useful when sample size is large. Both exact and normal approximation p value should be roughly similar.

Mann-Whitney U test interpretation: As the p value obtained from the Mann-Whitney U test is significant (U = 489.5, p < 0.05), we conclude that the yield of the two genotypes significantly different from each other .

Perform one-sided (median yield of A genotype is greater than median yield of genotype B) Mann-Whitney U test,

import scipy.stats as stats
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'greater')
# output
MannwhitneyuResult(statistic=489.5, pvalue=3.5023476972806333e-07)


As the p value obtained from the Mann-Whitney U test is significant (U = 489.5, p < 0.05), we conclude that the yield of the A genotype significantly greater than the genotype B.

References

1. Nachar N. The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology. 2008 Mar;4(1):13-20.
2. Mann–Whitney Test
3. Mann Whitney U Test (Wilcoxon Rank Sum Test)