Perform t-test from scratch in Python

Renesh Bedre    3 minute read

Student’s t-test

Calculate t-test from scratch

Calculating a t-test (t statistics and p value) from scratch is straightforward and you need to follow the following steps.

  • Get the sample data
  • Calculate the mean of the samples
  • Calculate the standard error
  • Calculate t statistics
  • Compare it with the t critical values to get the p value

Calculate one sample t-test from scratch

Let’s calculate one sample t-test (see dataset and formula for one sample t-test),

import numpy as np
from bioinfokit.analys import get_data
# load dataset as pandas dataframe
df = get_data('t_one_samp').data
# get as numpy array
a =  df['size'].to_numpy()
# known population mean 
mu = 5

# Calculate the mean and standard error
mean = np.mean(a)
std_error = np.std(a) / np.sqrt(len(a))

# calculate t statistics
t = abs(mean - mu) / std_error
t
# output
0.37162508611635603
  • Now, calculated t statistics need to compare with t critical values for finding the p value and hypothesis testing.
  • t critical value is a t statistic computed with a given significance level (α, type I error) and degree of freedom (n-1). It is denoted as tα,n-1. For example, t critical value for the two-tailed test with α = 0.05 and 49 degrees of freedom is 2.009 (see t critical value table ). t critical value can be computed in Python as follows,
from scipy import stats
# two-tailed critical value at alpha = 0.05
# q is lower tail probability and df is the degrees of freedom
t_crit = stats.t.ppf(q=0.975, df=49)
t_crit
# output 
2.009575234489209

# one-tailed critical value at alpha = 0.05
t_crit = stats.t.ppf(q=0.95, df=49)
t_crit
# output 
1.6765508919142629

# get two-tailed p value
p = 2*(1-stats.t.cdf(x=t, df=49))
# output 
0.7117742097899655

# get one-tailed p value
p = 1-stats.t.cdf(x=t, df=49)
# output
0.35588710489498276
  • As the calculated t statistic (0.3716) is less than the t critical value (2.009) and the two-tailed p value is 0.71, we fail to reject the null hypothesis and conclude that the sample mean is equal to the known population mean.

Calculate two sample t-test from scratch

Let’s calculate two sample t-test (see dataset and formula for two sample t-test),

import numpy as np
from bioinfokit.analys import get_data
# load dataset as pandas dataframe
df = get_data('t_ind_samp').data
# get as numpy array
x1 = df.loc[df['Genotype'] == 'A', 'yield'].to_numpy()
x2 = df.loc[df['Genotype'] == 'B', 'yield'].to_numpy()

# Calculate the mean and standard error
x1_bar, x2_bar = np.mean(x1), np.mean(x2)
n1, n2 = len(x1), len(x2)
var_x1, var_x2= np.var(x1, ddof=1), np.var(x2, ddof=1)

# pooled sample variance
pool_var = ( ((n1-1)*var_x1) + ((n2-1)*var_x2) ) / (n1+n2-2)

# standard error
std_error = np.sqrt(pool_var * (1.0 / n1 + 1.0 / n2))

# calculate t statistics
t = abs(x1_bar - x2_bar) / std_error
t
# output
5.407091104196024
  • t critical value for two sample t-test is denoted as tα,n1+n2-1. For example, t critical value for the two-tailed test with α = 0.05 and 11 degrees of freedom is 2.201 (see t critical value table ). t critical value can be computed in Python as follows,
from scipy import stats
# two-tailed critical value at alpha = 0.05
# q is lower tail probability and df is the degrees of freedom
t_crit = stats.t.ppf(q=0.975, df=11)
t_crit
# output 
2.200985160082949

# one-tailed critical value at alpha = 0.05
t_crit = stats.t.ppf(q=0.95, df=11)
t_crit
# output 
1.7958848187036691

# get two-tailed p value
p = 2*(1-stats.t.cdf(x=t, df=11))
# output 
0.000214337566542655

# get one-tailed p value
p = 1-stats.t.cdf(x=t, df=11)
# output
0.0001071687832713275
  • As the calculated t statistic (5.407) is greater than the t critical value (2.2009) and the two-tailed p value is 0.0002, we reject the null hypothesis in favor of the alternate hypothesis and conclude that the two groups means are significantly different.

Subscribe to get new article to your email when published

* indicates required

This work is licensed under a Creative Commons Attribution 4.0 International License