Central limit theorem states that sample means follow normal distribution. People often confuse this with and question the validity of uniform distribution of p-values under nul hypothesis. Should we not be observing extreme p-values much less ofter than usual pvalues like 0.2 or 0.3?

Imagine if null hypothesis is true. This means there is no difference betweem treatment and control.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy import stats
import matplotlib.pyplot as plt
pval_lis = []
t_stat_lis = []
samp_mean_lis = []

for i in range(100000):
    sample = np.random.normal(0, 1, 10)
    samp_mean = sample.mean()
    samp_mean_lis.append(samp_mean)
    t_stat = sample.mean()/(sample.std()/np.sqrt(sample.size))
    t_stat_lis.append(t_stat)
    pval = stats.t.sf(abs(t_stat), df=sample.size)*2
    pval_lis.append(pval)
plt.hist(np.array(samp_mean_lis))
Histogram of sample means

Distribution of sample means

plt.hist(np.array(t_stat_lis))
Histogram of t-statistics

Distribution of t-statistics

plt.hist(np.array(pval_lis))
Histogram of p-values under the null hypothesis

Distribution of p-values under the null hypothesis — uniformly distributed