Tutorials

Chi-Square Test of Independence with SciPy

The chi-square test of independence is used to test whether two categorical variables are related. It compares observed counts in a contingency table against the counts you would expect if the variables were independent.

### Basic Chi-Square Test

import numpy as np
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected.round(2))
Chi-square statistic: 23.829
P-value: 0.000007
Degrees of freedom: 2
Expected frequencies:
[[31.52 26.67 21.82]
 [33.48 28.33 23.18]]
### Interpreting the Result

import numpy as np
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

if p_value < 0.05:
    print("Reject the null hypothesis: the categorical variables are associated.")
else:
    print("Fail to reject the null hypothesis: the data do not show a clear association.")
Reject the null hypothesis: the categorical variables are associated.
### Visualizing Observed and Expected Counts

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

observed = np.array([
    [45, 25, 10],
    [20, 30, 35],
])

_, _, _, expected = stats.chi2_contingency(observed)

fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))
for ax, matrix, title in zip(axes, [observed, expected], ["Observed", "Expected"]):
    im = ax.imshow(matrix, cmap="Blues")
    ax.set_title(title)
    for i in range(matrix.shape[0]):
        for j in range(matrix.shape[1]):
            value = matrix[i, j]
            text = f"{value:.1f}" if title == "Expected" else f"{int(value)}"
            ax.text(j, i, text, ha="center", va="center")
plt.tight_layout()
plt.show()
### Practical Example: Device Type and Purchase Outcome

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

observed = np.array([
    [120, 55],
    [80, 110],
    [40, 95],
])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Conclusion: significant association detected." if p_value < 0.05 else "Conclusion: no significant association detected.")

plt.figure(figsize=(8, 5))
plt.imshow(observed, cmap="Purples")
plt.xticks([0, 1], ["No Purchase", "Purchase"])
plt.yticks([0, 1, 2], ["Desktop", "Tablet", "Mobile"])
for i in range(observed.shape[0]):
    for j in range(observed.shape[1]):
        plt.text(j, i, observed[i, j], ha="center", va="center", color="black")
plt.title("Observed Purchase Counts by Device Type")
plt.show()
Chi-square statistic: 50.568
P-value: 0.000000
Degrees of freedom: 2
Conclusion: significant association detected.
### Conclusion

The chi-square test of independence is a useful way to test whether two categorical variables move together. Looking at observed versus expected counts helps explain why the test reached its conclusion.