Chi-Square Test of Independence with SciPy

The chi-square test of independence answers questions like: do men and women click on different types of ads? Does customer satisfaction level vary by product category? Is disease prevalence different between smokers and non-smokers? You organize the counts into a contingency table (rows = one variable, columns = the other), then test whether the row variable and column variable are independent. The test compares the observed cell counts to the counts you'd expect if there were no association. A small p-value means the pattern in the data is unlikely to occur by chance if the variables were truly independent.

### Basic Chi-Square Test

The contingency table here has 2 rows (two groups) and 3 columns (three outcomes). The test checks whether the row and column distributions are independent.

import numpy as np
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected.round(2))

Chi-square statistic: 23.829
P-value: 0.000007
Degrees of freedom: 2
Expected frequencies:
[[31.52 26.67 21.82]
 [33.48 28.33 23.18]]

- `stats.chi2_contingency(observed)` takes the raw count table and returns the test statistic, p-value, degrees of freedom, and the expected counts under independence.
- The expected counts are what each cell would contain if the two variables were completely unrelated — the test statistic measures the total discrepancy between observed and expected.
- Degrees of freedom equals `(rows - 1) * (cols - 1)` — here `(2-1) * (3-1) = 2`.

### Interpreting the Result

A significant result means the variables are associated — the distribution across columns differs by row. It does not tell you how strong the association is or which specific cells drive it.

import numpy as np
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

if p_value < 0.05:
    print("Reject the null hypothesis: the categorical variables are associated.")
else:
    print("Fail to reject the null hypothesis: the data do not show a clear association.")

Reject the null hypothesis: the categorical variables are associated.

- The chi-square test has a reliability rule of thumb: expected counts in each cell should generally be at least 5. If many cells have small expected counts, consider Fisher's exact test instead.
- A significant result means the row category predicts (or is associated with) the column category — but not which direction or which cells are responsible.

### Visualizing Observed and Expected Counts

Comparing the observed and expected heatmaps side by side shows *where* the observed data diverges from independence — the cells where they differ most are driving the chi-square statistic.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

_, _, _, expected = stats.chi2_contingency(observed)

fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))
for ax, matrix, title in zip(axes, [observed, expected], ["Observed", "Expected"]):
    im = ax.imshow(matrix, cmap="Blues")
    ax.set_title(title)
    for i in range(matrix.shape[0]):
        for j in range(matrix.shape[1]):
            value = matrix[i, j]
            text = f"{value:.1f}" if title == "Expected" else f"{int(value)}"
            ax.text(j, i, text, ha="center", va="center")
plt.tight_layout()
plt.show()

- Cells where observed is much higher than expected contribute the most to the chi-square statistic — those are the cells that indicate where the groups behave differently.
- `_, _, _, expected` unpacks only the expected counts (the fourth return value) since we only need that for this visualization.

### Practical Example: Device Type and Purchase Outcome

This example tests whether purchase rate differs across device types — a common question in e-commerce analysis.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Rows: Desktop, Tablet, Mobile. Columns: No Purchase, Purchase.
observed = np.array([
    [120, 55],
    [80, 110],
    [40, 95],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Conclusion: significant association detected." if p_value < 0.05 else "Conclusion: no significant association detected.")

plt.figure(figsize=(8, 5))
plt.imshow(observed, cmap="Purples")
plt.xticks([0, 1], ["No Purchase", "Purchase"])
plt.yticks([0, 1, 2], ["Desktop", "Tablet", "Mobile"])
for i in range(observed.shape[0]):
    for j in range(observed.shape[1]):
        plt.text(j, i, observed[i, j], ha="center", va="center", color="black")
plt.title("Observed Purchase Counts by Device Type")
plt.show()

Chi-square statistic: 50.568
P-value: 0.000000
Degrees of freedom: 2
Conclusion: significant association detected.

- The heatmap makes it easy to see that mobile users purchase at a much higher rate relative to their non-purchase count compared to desktop users — exactly the kind of pattern the chi-square test detects.
- `dof = (3-1)*(2-1) = 2` here, because we have 3 device types and 2 outcomes.
- A significant result in this context would motivate investigating mobile UX or conversion optimization.

### Conclusion

The chi-square test of independence is the standard tool for testing whether two categorical variables move together. Use it whenever your data are counts organized by category, and always inspect the expected vs. observed counts to understand where the association comes from.

For continuous variables, see [correlation analysis with SciPy](/tutorials/correlation-analysis-with-scipy). For two-group comparisons on continuous outcomes, see the [independent samples t-test](/tutorials/independent-samples-t-test-with-scipy).