Fisher’s exact test
- Fisher’s exact test is a non-parametric method for comparing the proportion of categories in two different independent groups (categorical variables) in a contingency table. The categorical variables should be measured dichotomously (e.g., male/female, treated/no treated, cured/no cured, etc.,).
- Unlike the chi-square test, the Fisher’s exact test is an exact test (returns exact p value) and can be applied on smaller sample sizes (<1000). This test is an alternative to the chi-square test, especially when the frequency count is < 5 for more than 20% of cells. If you have larger sample size, it is better to use chi-square test.
- In the Fisher’s exact test, the probability of getting results (observed frequencies) is directly calculated from hypergeometric distribution and not from using any test statistics.
Fisher’s exact test assumptions
- The two variables are categorical (nominal) and data is randomly sampled
- The levels of variables are mutually exclusive
- Observations should be independent of each other
- Observation data should be frequency counts and not percentages, proportions or transformed data
Fisher’s exact test hypotheses
- Null hypothesis: The two categorical variables are independent (no association between the two variables)
- Alternative hypothesis: The two categorical variables are dependent (there is an association between the two variables)
Fisher’s exact test uses the hypergeometric distribution to assess the null hypothesis. Both one and two-tailed hypotheses can be tested using the Fisher’s exact test.
Fisher’s exact test and odds ratio formula
Suppose, we have the following 2 ✕ 2 contingency table,
Fisher’s exact test in R
Suppose, there are two categorical variables with binary outcome viz. treatments (treated and nontreated) and treatment outcomes (cured and noncured). In this dataset, we need to test if there is an association between treatments and treatment outcomes.
Load and visualize the dataset
Create a Data Frame,
# create a dataframe df <- data.frame("cured" = c(60, 30), "noncured" = c(10, 25), row.names = c("treated", "nontreated")) df # output cured noncured treated 60 10 nontreated 30 25
Visualize the dataset,
mosaicplot(df, color = TRUE)
Perform fisher’s exact test
library(stats) fisher.test(df) # output Fisher Exact Test for Count Data data: df p-value = 0.0002357 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 1.983312 13.107997 sample estimates: odds ratio 4.930093
As the p value (two-tailed) obtained from Fisher’s exact test is significant [p = 0.00023, Odds ratio = 4.93, 95% CI = 1.98-13.10], we reject the null hypothesis (p < 0.05) and conclude that there is a strong association between the two categorical independent variables (treatment and treatment outcomes)
The odds ratio (OR) can be used as an effect size for understanding the treatment effect and decision-making. The odds ratio indicates that the odds of getting cured while on treatment is 4.93 times that of not getting cured. In other words, the person getting treatment is more likely get cured than the person not getting treatment.
- Wong KC. Chi squared test versus Fisher’s exact test. Hong Kong Med J. 2011 Oct;17(5):427.
- THE ANALYSIS OF CATEGORICAL DATA:FISHER’S EXACT TEST
If you have any questions, comments, corrections, or recommendations, please email me at firstname.lastname@example.org
This work is licensed under a Creative Commons Attribution 4.0 International License