Normality Tests with SciPy

Many common statistical tests — including the [t-test](/tutorials/independent-samples-t-test-with-scipy) and [one-way ANOVA](/tutorials/one-way-anova-with-scipy) — assume the underlying data are approximately normally distributed, especially with small samples. If that assumption is badly violated, those tests can produce misleading p-values. Normality tests give you a formal way to check before proceeding. SciPy provides two complementary approaches: the Shapiro-Wilk test (generally the most sensitive for small to medium samples) and the Anderson-Darling test (which reports critical values rather than a single p-value). In practice, formal tests are most useful for small samples — with large samples they'll flag trivially small deviations. Always pair them with a visual check.

### Testing a Sample that is Approximately Normal

For data drawn from a normal distribution, both tests should show no significant departure from normality.

import numpy as np
import warnings
from scipy import stats

np.random.seed(11)
warnings.filterwarnings("ignore", category=FutureWarning)

normal_data = np.random.normal(loc=0, scale=1, size=200)

shapiro = stats.shapiro(normal_data)
anderson = stats.anderson(normal_data, dist="norm")

print(f"Shapiro-Wilk statistic: {shapiro.statistic:.3f}")
print(f"Shapiro-Wilk p-value: {shapiro.pvalue:.6f}")
print(f"Anderson-Darling statistic: {anderson.statistic:.3f}")
print("Critical values:", anderson.critical_values)

Shapiro-Wilk statistic: 0.995
Shapiro-Wilk p-value: 0.676268
Anderson-Darling statistic: 0.201
Critical values: [0.559 0.629 0.749 0.87  1.031]

- A Shapiro-Wilk statistic close to 1.0 indicates the data closely match a normal distribution; the p-value tests whether the deviation is significant.
- A high p-value (e.g., > 0.05) means we fail to reject normality — the data are consistent with a normal distribution, not that they're *proven* normal.
- The Anderson-Darling test returns critical values at several significance levels; if the test statistic is below the critical value for your chosen level, you fail to reject normality.

### Testing a Clearly Skewed Sample

Exponential data is heavily right-skewed — the majority of values are small, with a long tail of large ones. Both tests should strongly reject normality here.

import numpy as np
import warnings
from scipy import stats

np.random.seed(11)
warnings.filterwarnings("ignore", category=FutureWarning)

skewed_data = np.random.exponential(scale=1.0, size=200)

shapiro = stats.shapiro(skewed_data)
anderson = stats.anderson(skewed_data, dist="norm")

print(f"Shapiro-Wilk statistic: {shapiro.statistic:.3f}")
print(f"Shapiro-Wilk p-value: {shapiro.pvalue:.6f}")
print(f"Anderson-Darling statistic: {anderson.statistic:.3f}")
print("Critical values:", anderson.critical_values)

Shapiro-Wilk statistic: 0.865
Shapiro-Wilk p-value: 0.000000
Anderson-Darling statistic: 8.173
Critical values: [0.559 0.629 0.749 0.87  1.031]

- A Shapiro-Wilk statistic well below 1.0 and a near-zero p-value both signal a significant departure from normality.
- For the Anderson-Darling test, a statistic larger than the critical values at common significance levels (e.g., 5%, 1%) is strong evidence against normality.

### Histograms of the Two Samples

Before running any formal test, a histogram quickly reveals whether the data look roughly bell-shaped or clearly skewed. Visual inspection catches problems that statistical tests can miss with small samples.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(11)

normal_data = np.random.normal(loc=0, scale=1, size=200)
skewed_data = np.random.exponential(scale=1.0, size=200)

fig, axes = plt.subplots(1, 2, figsize=(10, 4.5))
axes[0].hist(normal_data, bins=20, alpha=0.75)
axes[0].set_title("Approximately Normal")
axes[1].hist(skewed_data, bins=20, alpha=0.75)
axes[1].set_title("Skewed")
plt.tight_layout()
plt.show()

- The normal histogram should show a rough bell shape centered around 0; the skewed histogram will have a tall leftward spike and a long right tail.
- A histogram with clear asymmetry, multiple peaks, or heavy tails is a strong visual signal to investigate further with a Q-Q plot or formal test.

### Q-Q Plots

A Q-Q (quantile-quantile) plot compares the quantiles of your data against the quantiles of a theoretical normal distribution. If the data are normal, the points should fall close to a straight diagonal line.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(11)

normal_data = np.random.normal(loc=0, scale=1, size=200)
skewed_data = np.random.exponential(scale=1.0, size=200)

fig, axes = plt.subplots(1, 2, figsize=(10, 4.5))
stats.probplot(normal_data, dist="norm", plot=axes[0])
axes[0].set_title("Q-Q Plot: Normal Data")
stats.probplot(skewed_data, dist="norm", plot=axes[1])
axes[1].set_title("Q-Q Plot: Skewed Data")
plt.tight_layout()
plt.show()

- `stats.probplot` plots the sample quantiles against theoretical normal quantiles and draws the reference line — deviations from the line indicate non-normality.
- Skewed data creates a characteristic curved pattern in the Q-Q plot: points curve away from the line at one or both ends.
- Q-Q plots are more informative than a single p-value because they show *where* and *how* the distribution departs from normal (e.g., only in the tails vs. throughout).

### Practical Example: Choosing a Test Based on Normality

A common workflow is to run a normality check first, then decide whether to use a parametric or nonparametric test based on the result.

import numpy as np
import warnings
from scipy import stats

np.random.seed(31)
warnings.filterwarnings("ignore", category=FutureWarning)

sample = np.random.exponential(scale=1.2, size=120)
shapiro = stats.shapiro(sample)

print(f"Shapiro-Wilk p-value: {shapiro.pvalue:.6f}")
if shapiro.pvalue < 0.05:
    print("Conclusion: the sample is not well modeled as normal; consider a nonparametric method.")
else:
    print("Conclusion: the sample does not show a strong departure from normality.")

Shapiro-Wilk p-value: 0.000000
Conclusion: the sample is not well modeled as normal; consider a nonparametric method.

- Exponential data will almost always fail a normality test — this example is designed to demonstrate the decision branch you'd take in practice.
- If the test rejects normality, consider the [Mann-Whitney U test](/tutorials/mann-whitney-u-test-with-scipy) (in place of a t-test) or the [Kolmogorov-Smirnov test](/tutorials/kolmogorov-smirnov-test-with-scipy) to compare distributions without the normality assumption.

### Conclusion

Use normality tests as a first-pass check, but don't rely on them alone — pair every formal test with a histogram and Q-Q plot. For small samples, formal tests have low power; for large samples, they're over-sensitive to trivial deviations. The practical question is whether the departure from normality is severe enough to undermine the test you want to run.

For robust alternatives that don't assume normality, see the [Mann-Whitney U test](/tutorials/mann-whitney-u-test-with-scipy) and the [Kolmogorov-Smirnov test](/tutorials/kolmogorov-smirnov-test-with-scipy).