Fisher’s exact test of independence in Python [with example]

Renesh Bedre    3 minute read

Fisher’s exact test

Fisher’s exact test is a statistical test used for testing the association between the two independent categorical variables. It is a non-parametric test and compares the proportion of categories in categorical variables.

In Fisher’s exact test, the null hypothesis of no association between the two categorical variables is tested against the alternate hypothesis that there is an association between the two categorical variables.

In contrast to the Chi-square test, which gives an approximate p value, Fisher’s exact test gives the exact p value. If you have a small sample size (say, < 1000), Fisher’s exact test is preferred over the Chi-square test. Fisher’s exact test is specifically used when the expected frequencies are less than 5 in more than 20% of cells in a contingency table.

Fisher’s exact test in Python

Fisher’s exact test example dataset

In the following table, there are two treatments (drug A and drug B) with their effect on disease development (disease and no disease). A large number of subjects receiving drug A developed no disease than drug B. Here, we need to test whether there is an association between drug treatments and disease status.

drug A drug B
no disease 80 48
disease 50 70

Load and visualize the dataset

Create a DataFrame,

# create a pandas dataframe with row and column names
import pandas as pd
df = pd.DataFrame({'drug A':[80, 50], 'drug B':[48, 70]}, index=pd.Index(['no disease', 'disease']))
df
# output
drug A  drug B
no disease      80      48
disease         50      70


Visualize the dataset as mosaic plot,

import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic

df_dict = {('drug A', 'no disease'): 80, ('drug A', 'disease'): 50, ('drug B', 'no disease'): 48,
('drug B', 'disease'): 70}
mosaic(df_dict)
plt.show()


Perform fisher’s exact test

Fisher’s exact test can be performed using the fisher_exact function available in the SciPy package,

fisher_exact function takes the following two arguments,

table: A 2x2 contingency table of non-negative integers
alternative: Type of test to calculate p value (two-sided, less, or greater)

from scipy.stats import fisher_exact
oddsr, p = fisher_exact(table=df.to_numpy(), alternative='two-sided')
oddsr, p
# output
(2.3333333333333335, 0.001425903669576289)


The p value (two-tailed) obtained from Fisher’s exact test is significant [p = 0.00142, Odds ratio = 2.33] is statistically significant (p < 0.05) and therefore, we reject the null hypothesis. Thus, we can conclude that there is a significant association between drug treatment and disease status.

The odds ratio (OR) can be used as an effect size for understanding the drug treatment effect and decision-making. The odds ratio indicates that the odds of getting no disease while on drug treatment is 2.33 times that of getting the disease. In other words, the subject getting drug treatment is more likely to get no disease than the subject not getting treatment.

Check online calculator for Fisher’s exact test

Enhance your skills with courses on Machine Learning and Python

References

If you have any questions, comments, corrections, or recommendations, please email me at reneshbe@gmail.com

If you enhanced your knowledge and practical skills from this article, consider supporting me on

Subscribe to get new article to your email when published

* indicates required