Statistical hypothesis testing, types of errors, and interpretation of p values
What is hypothesis testing?
 Hypothesis testing is an important statistical tool for making uniform decisions based on data using statistical methods.
 Hypothesis testing involves comparing the samples and draw conclusions based on the appropriate statistical tests.
 For example, gene expression between two conditions, the yield of two plant genotypes, an association between drug treatment and patient survival, comparing a sample mean with the population mean, the effect of multiple fertilizers on plant growth, etc.
Steps involved in hypothesis testing
 Propose null and alternate hypotheses based on the research questions
 Specify the significance level (α) for rejecting or accepting (fail to reject) the null hypothesis
 Perform the experiment and collect the data
 Use a proper statistical test to calculate the p value
 Interpret the analysis output
Null and alternate hypothesis
 Hypothesis testing is useful to answer the research questions and should be proposed before the experiment.
 For example, Are the changes in expression of some genes are induced by the treatment conditions? This research question can be stated simply in terms of the null hypothesis (H_{0}) as “there is no difference in gene expression between control and diseased conditions” versus alternate hypothesis (H_{a}) “there is a difference in gene expression between control and diseased conditions”.
 The appropriate statistical tests are then applied to test the null hypothesis against the alternate hypothesis. For the above example, a twosample ttest would be appropriate to test the gene expression differences between the two conditions.
 The statistical tests based on the collected data provide evidence based on the p value to reject or fail to reject the null hypothesis.
 If the p value is 0.01 (very unlikely event has occurred), it suggests that there is 1 chance out of 100 that you would obtain the difference in expression of the gene between two conditions when the null hypothesis is true. Generally, the null hypothesis is rejected at the 0.05 significance level (α).
One and twotailed (sided) alternate hypothesis
 A Onetailed or onesided hypothesis specifies the direction of the outcome (either greater or lesser).
For example,
onetailed (greater) null hypothesis “H_{0}: expression of a gene is higher in diseased condition than control condition”
onetailed (lesser) null hypothesis “H_{0}: expression of a gene is lesser in diseased condition than control condition”  Onetailed hypothesis are appropriate when only one direction of the outcome is more meaningful (e.g. drug has more side effects than control)
 A twotailed or twosided hypothesis would check if there is a difference (either greater or lesser) in the expression
of the gene between control and diseased conditions.
For example,
twotailed (greater or lesser) null hypothesis “H_{0}: there is a difference in the gene expression between control and diseased conditions”
Figure 1: t probability distributions for onetailed (lesser and greater) and twotailed hypotheses with 10 degree of freedoms
Type I (α), type II errors (β), and power (1β)
 Now, we have the null and alternate hypotheses and collected the data for statistical analysis. For gene expression example, the twosample ttest can be conducted to test the null hypothesis against the alternate hypothesis.
 If the p value obtained from the ttest is less than the significance level (α) 0.05 (t > t critical), the null hypothesis is rejected and the difference is statistically significant.
 Here, α = 0.05 (5%) represents the maximum chance of rejecting the null hypothesis when it is actually true (fail to reject the null hypothesis). The significance level (α) is also known as type I error (false positive).
 Generally, the significance level (α) is set in advance. The 5% significance level is arbitrary and can be changed based on the study design and research questions.

If the p value is 0.01, it suggests that there is 1 chance out of 100 that you would obtain the difference in expression of a gene between two conditions when the null hypothesis is true.
null hypothesis (H_{0}) difference
(H_{0} is false)no difference
(H_{0} is true)reject H_{0} true
(1β)type I error (α)
(reject H_{0} when it is true)fail to reject H_{0} type II error (β) true
(1α)  type II error (β) (false negative) occurs when the null hypothesis is fail to reject when it is actually reject.
 The quantity 1β is defined as the power (probability of not doing type II error). In other words, power is the probability of rejecting the null hypothesis when there is a significant difference i.e. H_{0} is false.
 It is ideal to have high power. The power can be increased by a larger sample size, increasing the significance level (α), smaller variance, and using proper experimental design.
Test statistics
 Test statistics (e.g. t test, z test, or F test) are useful for calculating p values and to make uniform decisions to reject or fail to reject the null hypothesis.
 Most of the time, it is difficult to have data from the whole population, test statistics are calculated based on the random samples from the population, which are assumed to have similar characteristics as population such as probability distribution.
 Large samples (n >= 30) drawn from a population (N) exhibit an approximate normal distribution for sample means, as per Central Limit Theorem (CLT). The z test statistic (zscore) follows a standard normal distribution (z distribution) can be used for large samples.
 For smaller sample size data, the t test statistic (Student’s ttest) can be used
Hypothesis testing examples
 Hypothesis testing using ttest
 Hypothesis testing using ANOVA
 Hypothesis testing using chisquared test
References
 Banerjee A, Chitnis UB, Jadhav SL, Bhawalkar JS, Chaudhury S. Hypothesis testing, type I and type II errors. Industrial psychiatry journal. 2009 Jul;18(2):127. https://link.springer.com/article/10.1186/cc1493
 Pereira SM, Leslie G. Hypothesis testing. Australian Critical Care. 2009 Nov 1;22(4):18791.
This work is licensed under a Creative Commons Attribution 4.0 International License