Linear Mixed Effects Models

Standard [linear regression](/tutorials/linear-regression) assumes every observation is independent. That breaks down when observations are clustered — students in the same school, animals in the same litter, or patients measured at multiple time points all share some unobserved group characteristic that makes them more similar to each other than to outsiders. A linear mixed effects model handles this by adding *random effects*: group-specific adjustments that capture the clustering. The result is more accurate standard errors, valid p-values, and the ability to estimate how much of the total variance comes from between-group differences versus within-group noise.

### Simulating nested data

We'll generate test scores for 150 students spread across 10 schools. Each school has its own random baseline, and homework hours has a fixed positive effect on scores.

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
n_schools = 10
n_per_school = 15

school_intercepts = rng.normal(0, 8, n_schools)  # between-school random effects

rows = []
for s in range(n_schools):
    hours = rng.uniform(1, 10, n_per_school)
    for h in hours:
        score = 50 + 3 * h + school_intercepts[s] + rng.normal(0, 5)
        rows.append({"school": f"School {s+1:02d}", "hours": h, "score": score})

df = pd.DataFrame(rows)
print(df.groupby("school")["score"].agg(["mean", "count"]).round(1))

           mean  count
school                
School 01  72.9     15
School 02  58.5     15
School 03  68.6     15
School 04  75.2     15
School 05  51.3     15
School 06  56.5     15
School 07  63.7     15
School 08  63.2     15
School 09  62.9     15
School 10  61.3     15

- `school_intercepts = rng.normal(0, 8, n_schools)` creates one random shift per school, sampled from a normal distribution with standard deviation 8. This is the true between-school variance the model will recover.
- The true data-generating process is `score = 50 + 3*hours + school_effect + noise`. The fixed intercept is 50, the fixed slope is 3.
- Each school's baseline is persistent: every student in School 03 benefits from (or is penalised by) the same school-level offset, regardless of their homework hours.

### Visualizing the group structure

Plotting all schools together reveals both the within-school trend and the between-school spread in baselines.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
n_schools = 10
n_per_school = 15
school_intercepts = rng.normal(0, 8, n_schools)
rows = []
for s in range(n_schools):
    hours = rng.uniform(1, 10, n_per_school)
    for h in hours:
        score = 50 + 3 * h + school_intercepts[s] + rng.normal(0, 5)
        rows.append({"school": f"School {s+1:02d}", "hours": h, "score": score})
df = pd.DataFrame(rows)

fig, ax = plt.subplots(figsize=(9, 5))
cmap = plt.get_cmap("tab10")
for i, (school, grp) in enumerate(df.groupby("school")):
    ax.scatter(grp["hours"], grp["score"], color=cmap(i), alpha=0.7,
               s=30, label=school)

ax.set_xlabel("Hours of homework per week")
ax.set_ylabel("Test score")
ax.set_title("Test scores by homework hours — 10 schools")
ax.legend(loc="upper left", fontsize=7, ncol=2)
plt.tight_layout()
plt.show()

- `df.groupby("school")` iterates over the data one school at a time, letting us give each school a distinct colour.
- Notice that the cloud of points for each school is shifted up or down as a block — that vertical offset is the random intercept the model will estimate.
- A naive OLS fit through all 150 points would ignore these offsets and produce biased standard errors, because students in the same school are not independent.

### Fitting a random intercept model

`statsmodels.formula.api.mixedlm` fits a linear mixed effects model using a formula. The `groups` argument identifies the clustering variable.

import numpy as np
import pandas as pd
from statsmodels.formula.api import mixedlm

rng = np.random.default_rng(42)
n_schools = 10
n_per_school = 15
school_intercepts = rng.normal(0, 8, n_schools)
rows = []
for s in range(n_schools):
    hours = rng.uniform(1, 10, n_per_school)
    for h in hours:
        score = 50 + 3 * h + school_intercepts[s] + rng.normal(0, 5)
        rows.append({"school": f"School {s+1:02d}", "hours": h, "score": score})
df = pd.DataFrame(rows)

model = mixedlm("score ~ hours", data=df, groups=df["school"])
result = model.fit(reml=True)
print(result.summary())

         Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: score    
No. Observations: 150     Method:             REML     
No. Groups:       10      Scale:              21.0924  
Min. group size:  15      Log-Likelihood:     -458.1704
Max. group size:  15      Converged:          Yes      
Mean group size:  15.0                                 
-------------------------------------------------------
             Coef.  Std.Err.   z    P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept    47.648    2.484 19.184 0.000 42.780 52.516
hours         2.874    0.141 20.393 0.000  2.598  3.150
Group Var    54.324    5.903                           
=======================================================

- `"score ~ hours"` specifies the fixed-effect formula: one predictor, `hours`, plus an implicit intercept.
- `groups=df["school"]` tells the model that observations sharing a school label are clustered and should receive a shared random intercept.
- `reml=True` uses Restricted Maximum Likelihood — the default and generally preferred method for estimating variance components, as it produces less-biased estimates than standard ML.
- The summary shows two tables: **Fixed Effects** (the overall intercept and slope, with z-statistics and p-values) and **Random Effects** (the variance of the school-level intercepts, labelled `Group Var`).

### Extracting fixed and random effects

The fitted result object exposes both the population-level coefficients and each school's individual deviation.

import numpy as np
import pandas as pd
from statsmodels.formula.api import mixedlm

rng = np.random.default_rng(42)
n_schools = 10
n_per_school = 15
school_intercepts = rng.normal(0, 8, n_schools)
rows = []
for s in range(n_schools):
    hours = rng.uniform(1, 10, n_per_school)
    for h in hours:
        score = 50 + 3 * h + school_intercepts[s] + rng.normal(0, 5)
        rows.append({"school": f"School {s+1:02d}", "hours": h, "score": score})
df = pd.DataFrame(rows)

result = mixedlm("score ~ hours", data=df, groups=df["school"]).fit(reml=True)

print("Fixed effects:")
print(f"  Intercept: {result.fe_params['Intercept']:.2f}  (true: 50)")
print(f"  hours:     {result.fe_params['hours']:.2f}  (true: 3)")

print("\nRandom intercepts per school:")
for school, re in sorted(result.random_effects.items()):
    print(f"  {school}: {re.iloc[0]:+.2f}")

group_var = result.cov_re.iloc[0, 0]
resid_var = result.scale
icc = group_var / (group_var + resid_var)
print(f"\nGroup variance: {group_var:.2f}  Residual variance: {resid_var:.2f}")
print(f"ICC: {icc:.2f}  (fraction of variance explained by school membership)")

Fixed effects:
  Intercept: 47.65  (true: 50)
  hours:     2.87  (true: 3)

Random intercepts per school:
  School 01: +6.26
  School 02: -5.34
  School 03: +6.79
  School 04: +10.95
  School 05: -12.55
  School 06: -6.32
  School 07: +2.47
  School 08: -2.09
  School 09: +4.10
  School 10: -4.27

Group variance: 54.32  Residual variance: 21.09
ICC: 0.72  (fraction of variance explained by school membership)

- `result.fe_params` is a Series of fixed-effect estimates — the intercept and slope that apply to all students regardless of school.
- `result.random_effects` is a dict keyed by group label; `re.iloc[0]` extracts each school's estimated deviation from the fixed intercept.
- `result.cov_re.iloc[0, 0]` is the estimated variance of the random intercepts — how much schools differ from each other on average.
- The **ICC** (Intraclass Correlation Coefficient) measures what fraction of total variability is due to group membership. An ICC near 1 means knowing the school tells you almost everything; near 0 means schools are interchangeable.

### Visualizing group-specific fitted lines

Each school gets its own intercept (fixed intercept + school's random deviation), while sharing the same slope for homework hours.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.formula.api import mixedlm

rng = np.random.default_rng(42)
n_schools = 10
n_per_school = 15
school_intercepts = rng.normal(0, 8, n_schools)
rows = []
for s in range(n_schools):
    hours = rng.uniform(1, 10, n_per_school)
    for h in hours:
        score = 50 + 3 * h + school_intercepts[s] + rng.normal(0, 5)
        rows.append({"school": f"School {s+1:02d}", "hours": h, "score": score})
df = pd.DataFrame(rows)

result = mixedlm("score ~ hours", data=df, groups=df["school"]).fit(reml=True)

fe_int = result.fe_params["Intercept"]
fe_slope = result.fe_params["hours"]
x_range = np.linspace(df["hours"].min(), df["hours"].max(), 100)

fig, ax = plt.subplots(figsize=(9, 5))
cmap = plt.get_cmap("tab10")

for i, (school, grp) in enumerate(df.groupby("school")):
    ax.scatter(grp["hours"], grp["score"], color=cmap(i), alpha=0.5, s=25)
    rand_int = result.random_effects[school].iloc[0]
    y_line = (fe_int + rand_int) + fe_slope * x_range
    ax.plot(x_range, y_line, color=cmap(i), linewidth=1.5)

# Population-level line
ax.plot(x_range, fe_int + fe_slope * x_range, color="black",
        linewidth=2.5, linestyle="--", label="Population mean")

ax.set_xlabel("Hours of homework per week")
ax.set_ylabel("Predicted test score")
ax.set_title("Random intercept model: school-specific fitted lines")
ax.legend()
plt.tight_layout()
plt.show()

- Each coloured line has the same slope (`fe_slope`) but a different y-intercept — `fe_int + rand_int` — making the lines parallel.
- The black dashed line is the population mean, i.e., what you'd predict for a student from a hypothetical average school.
- If the lines were identical, there would be no between-school variance; the fact that they're spread apart reflects the estimated `Group Var` from the model summary.

### Conclusion

Linear mixed effects models are the correct tool whenever your data has a grouped structure — schools, subjects, clinics, time series — that violates the independence assumption of ordinary regression. The fixed effects give you the overall relationship you care about, and the random effects absorb the clustering so your inference stays valid.

For repeated measurements of the same subject over time, see [repeated measures ANOVA](/tutorials/repeated-measures-anova). For the formula syntax used here, see [Statsmodels formula API regression](/tutorials/statsmodels-formula-api-regression).