Two-Way ANOVA

[One-way ANOVA](/tutorials/one-way-anova-with-scipy) answers "does this one factor affect the outcome?" Two-way ANOVA answers the richer question: "do two factors each have an effect, and do they *interact* with each other?" An interaction means the effect of one factor depends on the level of the other. For example, premium fertilizer might only boost crop yield significantly when watering is also high — giving premium fertilizer to drought-stressed plants may show no benefit. Testing for this interaction is the key advantage of two-way ANOVA over running two separate one-way tests.

### Setting up a balanced dataset

We'll simulate a plant growth experiment with two factors: fertilizer type (None, Standard, Premium) and watering frequency (Low, High), with ten plants per combination.

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
n = 10  # observations per cell

# True group means — premium fertilizer responds strongly to high watering
means = {
    ("None", "Low"): 20, ("None", "High"): 22,
    ("Standard", "Low"): 28, ("Standard", "High"): 32,
    ("Premium", "Low"): 30, ("Premium", "High"): 42,
}

rows = []
for (fert, water), mu in means.items():
    for g in rng.normal(mu, 3, n):
        rows.append({"fertilizer": fert, "watering": water, "growth": g})

df = pd.DataFrame(rows)

group_means = df.groupby(["fertilizer", "watering"])["growth"].mean().round(2)
print(group_means)
print(f"\nTotal observations: {len(df)}")

fertilizer  watering
None        High        22.81
            Low         18.99
Premium     High        41.81
            Low         30.91
Standard    High        32.31
            Low         28.35
Name: growth, dtype: float64

Total observations: 60

- Each combination of fertilizer and watering level has 10 observations (a *balanced* design), giving 60 rows total.
- The true means are constructed so that the premium fertilizer shows a much larger benefit from high watering (+12) than the others (+2 and +4) — a built-in interaction effect.
- `rng.normal(mu, 3, n)` adds noise with standard deviation 3 around each true mean, making the data realistic.

### Visualizing group means

Before running any test, a grouped bar chart shows whether the pattern in the data matches your expectations.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
means = {
    ("None", "Low"): 20, ("None", "High"): 22,
    ("Standard", "Low"): 28, ("Standard", "High"): 32,
    ("Premium", "Low"): 30, ("Premium", "High"): 42,
}
rows = []
for (fert, water), mu in means.items():
    for g in rng.normal(mu, 3, 10):
        rows.append({"fertilizer": fert, "watering": water, "growth": g})
df = pd.DataFrame(rows)

fertilizers = ["None", "Standard", "Premium"]
waterings = ["Low", "High"]
x = np.arange(len(fertilizers))
width = 0.35

fig, ax = plt.subplots(figsize=(8, 5))
for i, water in enumerate(waterings):
    group = df[df["watering"] == water].groupby("fertilizer")["growth"].mean()
    group = group.reindex(fertilizers)
    ax.bar(x + i * width - width / 2, group, width, label=f"Watering: {water}")

ax.set_xticks(x)
ax.set_xticklabels(fertilizers)
ax.set_xlabel("Fertilizer type")
ax.set_ylabel("Mean growth (cm)")
ax.set_title("Mean growth by fertilizer and watering")
ax.legend()
plt.tight_layout()
plt.show()

- The bars for Premium fertilizer are much more spread apart between Low and High watering than the bars for None — this visual gap is the interaction.
- `group.reindex(fertilizers)` ensures the bars appear in the order None → Standard → Premium regardless of how pandas sorted the groups.
- `x + i * width - width / 2` positions the two bars side by side for each fertilizer level.

### Running the two-way ANOVA

statsmodels' formula API expresses the model compactly. The `*` operator includes both main effects and their interaction in one formula.

import numpy as np
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

rng = np.random.default_rng(42)
means = {
    ("None", "Low"): 20, ("None", "High"): 22,
    ("Standard", "Low"): 28, ("Standard", "High"): 32,
    ("Premium", "Low"): 30, ("Premium", "High"): 42,
}
rows = []
for (fert, water), mu in means.items():
    for g in rng.normal(mu, 3, 10):
        rows.append({"fertilizer": fert, "watering": water, "growth": g})
df = pd.DataFrame(rows)

model = ols("growth ~ C(fertilizer) * C(watering)", data=df).fit()
table = anova_lm(model, typ=2)
print(table.round(4))

                              sum_sq    df         F  PR(>F)
C(fertilizer)              2427.5978   2.0  216.6242     0.0
C(watering)                 581.1197   1.0  103.7112     0.0
C(fertilizer):C(watering)   163.8797   2.0   14.6236     0.0
Residual                    302.5754  54.0       NaN     NaN

- `C(fertilizer)` tells the formula parser to treat the column as a categorical variable, generating dummy-coded contrasts automatically.
- `C(fertilizer) * C(watering)` expands to `C(fertilizer) + C(watering) + C(fertilizer):C(watering)` — main effects plus their interaction term.
- `anova_lm(model, typ=2)` computes Type II sums of squares, which tests each effect after accounting for all other main effects (but not the interaction). This is the standard choice for balanced designs.
- The output shows `sum_sq`, `df`, `F`, and `PR(>F)` for fertilizer, watering, their interaction, and the residual.

### Interpreting the results

The ANOVA table has three rows of interest: the two main effects and the interaction term.

import numpy as np
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

rng = np.random.default_rng(42)
means = {
    ("None", "Low"): 20, ("None", "High"): 22,
    ("Standard", "Low"): 28, ("Standard", "High"): 32,
    ("Premium", "Low"): 30, ("Premium", "High"): 42,
}
rows = []
for (fert, water), mu in means.items():
    for g in rng.normal(mu, 3, 10):
        rows.append({"fertilizer": fert, "watering": water, "growth": g})
df = pd.DataFrame(rows)

model = ols("growth ~ C(fertilizer) * C(watering)", data=df).fit()
table = anova_lm(model, typ=2)

alpha = 0.05
for name, row in table.iterrows():
    if name == "Residual":
        continue
    sig = "significant" if row["PR(>F)"] < alpha else "not significant"
    print(f"{name}: F={row['F']:.2f}, p={row['PR(>F)']:.4f} → {sig}")

C(fertilizer): F=216.62, p=0.0000 → significant
C(watering): F=103.71, p=0.0000 → significant
C(fertilizer):C(watering): F=14.62, p=0.0000 → significant

- A significant main effect for `C(fertilizer)` means fertilizer type matters, averaged across both watering levels.
- A significant main effect for `C(watering)` means watering level matters, averaged across all fertilizer types.
- A significant interaction `C(fertilizer):C(watering)` means the effect of fertilizer *depends on* watering level — you cannot fully describe either main effect without knowing the other factor's level.
- When an interaction is significant, interpret the main effects with caution: the "average" effect of fertilizer hides the fact that it behaves very differently under Low vs High watering.

### Visualizing the interaction

An interaction plot draws one line per level of one factor, with the other factor on the x-axis. Parallel lines indicate no interaction; crossing or diverging lines indicate an interaction.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
means = {
    ("None", "Low"): 20, ("None", "High"): 22,
    ("Standard", "Low"): 28, ("Standard", "High"): 32,
    ("Premium", "Low"): 30, ("Premium", "High"): 42,
}
rows = []
for (fert, water), mu in means.items():
    for g in rng.normal(mu, 3, 10):
        rows.append({"fertilizer": fert, "watering": water, "growth": g})
df = pd.DataFrame(rows)

fertilizers = ["None", "Standard", "Premium"]
waterings = ["Low", "High"]

fig, ax = plt.subplots(figsize=(7, 5))
for fert in fertilizers:
    group = df[df["fertilizer"] == fert].groupby("watering")["growth"].mean()
    group = group.reindex(waterings)
    ax.plot(waterings, group.values, marker="o", linewidth=2, label=fert)

ax.set_xlabel("Watering frequency")
ax.set_ylabel("Mean growth (cm)")
ax.set_title("Interaction Plot: Fertilizer × Watering")
ax.legend(title="Fertilizer")
plt.tight_layout()
plt.show()

- The slope of the Premium line is much steeper than the lines for None and Standard — this divergence is the interaction.
- If all three lines were parallel (same slope), it would mean the fertilizer effect is the same regardless of watering, and there would be no interaction.
- Interaction plots are the most direct visual tool for explaining a significant interaction term to a non-technical audience.

### Conclusion

Two-way ANOVA separates the contributions of two factors and tests whether they interact — something that two separate one-way ANOVAs can never reveal. When the interaction is significant, always report and visualize it: the main effects alone give an incomplete and potentially misleading picture.

To test a single categorical factor, see [One-Way ANOVA with SciPy](/tutorials/one-way-anova-with-scipy). For a regression-based approach that handles both continuous and categorical predictors in the same model, see [Statsmodels formula API regression](/tutorials/statsmodels-formula-api-regression).