The chi-square test of independence is used to test whether two categorical variables are related. It compares observed counts in a contingency table against the counts you would expect if the variables were independent. ### Basic Chi-Square Test
import numpy as np
from scipy import stats
observed = np.array([
[45, 25, 10],
[20, 30, 35],
])
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Expected frequencies:")
print(expected.round(2))Chi-square statistic: 23.829 P-value: 0.000007 Degrees of freedom: 2 Expected frequencies: [[31.52 26.67 21.82] [33.48 28.33 23.18]]
### Interpreting the Result
import numpy as np
from scipy import stats
observed = np.array([
[45, 25, 10],
[20, 30, 35],
])
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
if p_value < 0.05:
print("Reject the null hypothesis: the categorical variables are associated.")
else:
print("Fail to reject the null hypothesis: the data do not show a clear association.")Reject the null hypothesis: the categorical variables are associated.
### Visualizing Observed and Expected Counts
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
observed = np.array([
[45, 25, 10],
[20, 30, 35],
])
_, _, _, expected = stats.chi2_contingency(observed)
fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))
for ax, matrix, title in zip(axes, [observed, expected], ["Observed", "Expected"]):
im = ax.imshow(matrix, cmap="Blues")
ax.set_title(title)
for i in range(matrix.shape[0]):
for j in range(matrix.shape[1]):
value = matrix[i, j]
text = f"{value:.1f}" if title == "Expected" else f"{int(value)}"
ax.text(j, i, text, ha="center", va="center")
plt.tight_layout()
plt.show()### Practical Example: Device Type and Purchase Outcome
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
observed = np.array([
[120, 55],
[80, 110],
[40, 95],
])
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-square statistic: {chi2:.3f}")
print(f"P-value: {p_value:.6f}")
print(f"Degrees of freedom: {dof}")
print("Conclusion: significant association detected." if p_value < 0.05 else "Conclusion: no significant association detected.")
plt.figure(figsize=(8, 5))
plt.imshow(observed, cmap="Purples")
plt.xticks([0, 1], ["No Purchase", "Purchase"])
plt.yticks([0, 1, 2], ["Desktop", "Tablet", "Mobile"])
for i in range(observed.shape[0]):
for j in range(observed.shape[1]):
plt.text(j, i, observed[i, j], ha="center", va="center", color="black")
plt.title("Observed Purchase Counts by Device Type")
plt.show()Chi-square statistic: 50.568 P-value: 0.000000 Degrees of freedom: 2 Conclusion: significant association detected.
### Conclusion The chi-square test of independence is a useful way to test whether two categorical variables move together. Looking at observed versus expected counts helps explain why the test reached its conclusion.