Mann-Whitney U test (Wilcoxon rank sum test) in Python

Renesh Bedre    3 minute read

Mann-Whitney U test

  • Mann-Whitney U test is a non-parametric (distribution free) alternative to the two sample t-test. It is first proposed by Frank Wilcoxon (1945) and later worked by Henry Mann and Donald Whitney (1947). Hence, the Mann-Whitney U test is also known as Wilcoxon rank sum test or Wilcoxon‐Mann‐Whitney (WMW) test.

    Note: Wilcoxon rank sum test is different than Wilcoxon signed rank sum test. On paired data, the Wilcoxon signed rank sum test is used.

  • Mann-Whitney U test does not assume any specific distribution (such as normal distribution of samples) for calculating test statistics and p values.
  • The sample medians are compared in the Mann-Whitney U test, which distinguishes it from the t-test, which compares sample means.
  • Mann-Whitney U test can be applied on small number of samples (5-20).
  • Though Mann-Whitney U test and t-test has similar statistical power, it is always wise to use t-test if its assumptions are met.

Mann-Whitney U test assumptions

Mann-Whitney U test Hypotheses

If we have two groups with observations x1, x2, …, xm and y1, y2, …, yn sampled from X and Y populations, then Mann-Whitney U test compares each observation xi from sample x with each observation (yj) from sample y.

Null hypothesis: p (xi > yj ) = 0.5
Alternative hypothesis: p (xi > yj ) ≠ 0.5

Above two-sided alternative hypothesis tests that there is equal probability of xi is greater or lesser than yj (both groups came from same population),

One-sided alternative hypothesis tests probability of xi is greater than yj and vice versa.

We can also state the two-sided hypothesis in terms of median as

Null hypothesis: Two groups have equal median

Alternative hypothesis: Two groups does not have equal median

One-sided alternative hypothesis tests median from one group can be greater or lesser than other group.

Learn more about hypothesis testing and interpretation

Mann-Whitney U Test formula

Mann-Whitney U Test formulas

The p value is calculated based on the comparison between the critical value and the U value. If U value <= critical value, we reject the null hypothesis and vice versa.

Perform Mann-Whitney U test in Python

Get dataset

Load hypothetical plant genotypes (A and B) yield dataset,

Learn how to import data using pandas

import pandas as pd
df = pd.read_csv("https://reneshbedre.github.io/assets/posts/mann_whitney/genotype.csv")
df.head(2)
    A   B
0  60  10
1  30  25

# generate boxplot to check data spread
import matplotlib.pyplot as plt
df.boxplot(column=['A', 'B'], grid=False)
plt.show()

boxplot for data distribution

Check data distribution

Check data distribution using Shapiro-Wilk test and histogram ,

import scipy.stats as stats
w, pvalue = stats.shapiro(df['A'])
w, pvalue
(0.8239281177520752, 0.0009495539125055075)

w, pvalue = stats.shapiro(df['B'])
w, pvalue
(0.7946348190307617, 0.00031481595942750573)

# plot histogram
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Frequency histogram of genotypes yield')
ax1.hist(df['A'], bins=10, histtype='bar', ec='k') 
ax2.hist(df['B'], bins=10, histtype='bar', ec='k') 
ax1.set_xlabel("Yield")
ax2.set_xlabel("Yield")
plt.show()

frequnecy histogram for checking
  data distribution

As the p value obtained from the Shapiro-Wilk test is significant (p < 0.05), we conclude that the data is not normally distributed. Further, in histogram data distribution shape does not look normal. Therefore, Mann-Whitney U test is more appropriate for analyzing two samples.

Perform Mann-Whitney U test

Perform two-sided (yield of two genotypes does not have equal medians) Mann-Whitney U test,

import scipy.stats as stats
# perform two-sided test. You can use 'greater' or 'less' for one-sided test
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'two-sided')
# output
MannwhitneyuResult(statistic=489.5, pvalue=7.004695394561267e-07)

Note: p value obtained from mannwhitneyu is based on the normal approximation and not exact. Set use_continuity=False for exact p value. Normal approximation is useful when sample size is large. Both exact and normal approximation p value should be roughly similar.

As the p value obtained from the Mann-Whitney U test is significant (p < 0.05), we conclude that the yield of the two genotypes significantly different from each other .

Perform one-sided (median yield of A genotype is greater than median yield of genotype B) Mann-Whitney U test,

import scipy.stats as stats
stats.mannwhitneyu(x=df['A'], y=df['B'], alternative = 'greater')
# output
MannwhitneyuResult(statistic=489.5, pvalue=3.5023476972806333e-07)

As the p value obtained from the Mann-Whitney U test is significant (p < 0.05), we conclude that the yield of the A genotype significantly greater than the genotype B.

References

  1. Nachar N. The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology. 2008 Mar;4(1):13-20.
  2. Mann–Whitney Test
  3. Mann Whitney U Test (Wilcoxon Rank Sum Test)

This work is licensed under a Creative Commons Attribution 4.0 International License