Autocorrelation and PACF Plots

A time series observation is often correlated with its own past values — yesterday's temperature predicts today's, last quarter's sales predict this quarter's. The autocorrelation function (ACF) measures the correlation between a series and itself at different lags. The partial autocorrelation function (PACF) refines this: it measures the direct correlation at each lag after removing the influence of all intermediate lags. Together, ACF and PACF are the primary diagnostic tools for ARIMA model selection — specific patterns (exponential decay vs sharp cut-off) tell you whether the series is better described by an autoregressive (AR) or moving average (MA) process.

### Simulating AR and MA processes

An AR(2) process and an MA(2) process each produce a distinct ACF/PACF signature that the diagnostic plots should reveal.

import numpy as np

rng = np.random.default_rng(42)
n = 300

# AR(2): y_t = 0.6*y_{t-1} + 0.3*y_{t-2} + noise
ar2 = np.zeros(n)
for t in range(2, n):
    ar2[t] = 0.6 * ar2[t-1] + 0.3 * ar2[t-2] + rng.normal()

# MA(2): y_t = noise_t + 0.7*noise_{t-1} + 0.4*noise_{t-2}
eps = rng.normal(0, 1, n + 2)
ma2 = np.array([eps[t] + 0.7 * eps[t-1] + 0.4 * eps[t-2] for t in range(2, n + 2)])

print(f"AR(2) — mean: {ar2.mean():.2f}, std: {ar2.std():.2f}")
print(f"MA(2) — mean: {ma2.mean():.2f}, std: {ma2.std():.2f}")

AR(2) — mean: -0.33, std: 1.55
MA(2) — mean: -0.02, std: 1.36

- An AR(2) series has memory: each value depends on the last two, creating correlations that decay gradually across many lags.
- An MA(2) series has short memory: it is a weighted sum of only two past noise terms, so correlations cut off sharply after lag 2.
- Simulating both processes lets you compare their ACF/PACF shapes side by side — the key skill in ARIMA identification.

### ACF and PACF plots for an AR(2) process

For an AR(p) process the ACF decays slowly (exponential or oscillating) while the PACF cuts off sharply after lag p.

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

rng = np.random.default_rng(42)
n = 300
ar2 = np.zeros(n)
for t in range(2, n):
    ar2[t] = 0.6 * ar2[t-1] + 0.3 * ar2[t-2] + rng.normal()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(ar2,  lags=30, ax=axes[0], title="AR(2) — ACF")
plot_pacf(ar2, lags=30, ax=axes[1], title="AR(2) — PACF", method="ywm")
plt.tight_layout()
plt.show()

- `plot_acf` computes the correlation of the series with lagged versions of itself for lags 1, 2, …, 30. The shaded band is the 95% confidence interval under the null hypothesis of zero autocorrelation.
- For an AR(2) the ACF should decay gradually; the PACF should have two significant spikes at lags 1 and 2 and then drop inside the band — indicating two autoregressive terms.
- `method="ywm"` (Yule-Walker with bias correction) is a robust PACF estimator. Bars outside the shaded band are statistically significant at the 5% level.

### ACF and PACF plots for an MA(2) process

For an MA(q) process the roles are reversed: the ACF cuts off after lag q, while the PACF decays slowly.

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

rng = np.random.default_rng(42)
n = 300
eps = rng.normal(0, 1, n + 2)
ma2 = np.array([eps[t] + 0.7 * eps[t-1] + 0.4 * eps[t-2] for t in range(2, n + 2)])

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(ma2,  lags=20, ax=axes[0], title="MA(2) — ACF")
plot_pacf(ma2, lags=20, ax=axes[1], title="MA(2) — PACF", method="ywm")
plt.tight_layout()
plt.show()

- The MA(2) ACF should show two significant bars at lags 1 and 2 then fall inside the band — the clean cut-off that identifies moving average order.
- The PACF decays slowly or oscillates because the MA process creates indirect correlations at all lags through the noise terms.
- The identification rule: **AR(p)** → ACF decays, PACF cuts off at p. **MA(q)** → ACF cuts off at q, PACF decays. **ARMA(p,q)** → both decay gradually.

### Conclusion

ACF and PACF plots are the starting point for ARIMA model selection — they let you identify the order of autoregressive and moving average components before fitting. Always check that the series is stationary (constant mean and variance) before interpreting the plots; a trending or seasonal series will show persistent high correlations at all lags that obscure the true structure. Apply differencing or seasonal decomposition first if needed.

For full ARIMA model fitting after identifying the order, see [ARIMA models with statsmodels](/tutorials/statsmodels-arima-models). For decomposing trend and seasonality before checking autocorrelation, see [time series decomposition](/tutorials/time-series-decomposition-statsmodels).