Tutorials

Lag Plots

Before applying any statistical model to a time series, you need to know whether observations are correlated with their own past values. A lag plot is the simplest visual check: plot each value `y(t)` against the value `window` steps earlier `y(t - lag)`. If the scatter shows a linear trend, adjacent values are correlated — the series has autocorrelation. A circular or elliptical pattern indicates a repeating cycle. A shapeless cloud means the series is effectively random at that lag, with no memory of past values. Lag plots complement the numerical ACF by showing not just whether autocorrelation exists but what *kind* of relationship it is.

### Creating a lag-1 plot with pandas

`pandas.plotting.lag_plot` creates a scatter of a series against itself shifted by one period — the default and most common starting point.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot

rng = np.random.default_rng(42)
n = 200

# Autocorrelated AR(1) series
ar1 = np.zeros(n)
for t in range(1, n):
    ar1[t] = 0.85 * ar1[t-1] + rng.normal()
ts_ar1 = pd.Series(ar1)

# Random white noise for comparison
ts_noise = pd.Series(rng.normal(0, 1, n))

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
lag_plot(ts_ar1,   ax=axes[0], alpha=0.5, c="steelblue")
axes[0].set_title("AR(1) series — strong autocorrelation")
axes[0].set_xlabel("y(t)")
axes[0].set_ylabel("y(t+1)")

lag_plot(ts_noise, ax=axes[1], alpha=0.5, c="tomato")
axes[1].set_title("White noise — no autocorrelation")
axes[1].set_xlabel("y(t)")
axes[1].set_ylabel("y(t+1)")

plt.tight_layout()
plt.show()
- `lag_plot(series)` plots each value against the next value in the series (lag 1). The x-axis is `y(t)` and the y-axis is `y(t+1)`.
- The AR(1) plot should show a clear positive linear trend — high values tend to be followed by high values, low by low. The slope of the ellipse approximates the autocorrelation.
- The white-noise plot shows a symmetric circular cloud centred at the origin — no structure means each point is independent of the previous one.

### Multi-lag grid

Plotting lags 1, 2, 3, and 4 together shows how autocorrelation decays with increasing lag distance.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot

rng = np.random.default_rng(42)
n = 300

# AR(2) series: y_t = 0.7*y_{t-1} + 0.2*y_{t-2} + noise
ar2 = np.zeros(n)
for t in range(2, n):
    ar2[t] = 0.7 * ar2[t-1] + 0.2 * ar2[t-2] + rng.normal()
ts = pd.Series(ar2)

fig, axes = plt.subplots(2, 4, figsize=(14, 6))
for lag, ax in zip(range(1, 9), axes.ravel()):
    lag_plot(ts, lag=lag, ax=ax, alpha=0.4, c="steelblue", s=10)
    ax.set_title(f"Lag {lag}")
    ax.set_xlabel("y(t)")
    ax.set_ylabel(f"y(t+{lag})")

plt.suptitle("AR(2) lag plots — lags 1 through 8", y=1.02)
plt.tight_layout()
plt.show()
- `lag_plot(ts, lag=k)` plots `y(t)` against `y(t+k)`. For an AR(2) with positive coefficients, the linear structure should remain visible at lags 1 and 2 then gradually fade.
- As the lag increases, the ellipses become rounder and the linear trend weaker — eventually the scatter looks random, indicating the series has "forgotten" that far back.
- Plotting all lags together is faster than computing and tabulating correlation coefficients, and gives you immediate visual confirmation of which lags matter.

### Detecting seasonal patterns

A series with a weekly cycle will show a tight linear pattern at lag 7 but a diffuse cloud at other lags.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot

rng = np.random.default_rng(42)
n = 280  # 40 weeks

t = np.arange(n)
seasonal = 5 * np.sin(2 * np.pi * t / 7)  # weekly cycle
ts = pd.Series(seasonal + rng.normal(0, 0.8, n))

lags_to_show = [1, 3, 7, 14]
fig, axes = plt.subplots(1, 4, figsize=(14, 3))
for lag, ax in zip(lags_to_show, axes):
    lag_plot(ts, lag=lag, ax=ax, alpha=0.4, c="steelblue", s=12)
    ax.set_title(f"Lag {lag}{'  ← seasonal' if lag in [7, 14] else ''}")
    ax.set_xlabel("y(t)")
    ax.set_ylabel(f"y(t+{lag})")

plt.suptitle("Weekly seasonal series — lag plots reveal period", y=1.05)
plt.tight_layout()
plt.show()
- At lag 7 the plot shows a tight positive linear band — `y(t)` and `y(t+7)` are almost the same value because they correspond to the same weekday in consecutive weeks.
- At lag 14 a similar pattern appears (two weeks apart) — any multiple of the seasonal period will show structure.
- Lags 1 and 3 show diffuse clouds — adjacent days are not correlated because the sine wave is changing faster than adjacent-lag correlation can capture.

### Conclusion

Lag plots are a fast visual complement to the numerical ACF: instead of a bar chart of correlation coefficients, you see the raw scatter at each lag and can spot non-linear relationships (curved patterns), outliers (isolated points far from the cloud), or seasonal structure (tight patterns at multiples of the period) that a correlation number alone would miss. Always check several lags, not just lag 1.

For numerical autocorrelation statistics and ARIMA order identification, see [autocorrelation and PACF plots](/tutorials/autocorrelation-and-pacf-plots). For computing rolling summaries of a time series, see [moving averages with pandas](/tutorials/moving-averages-pandas).