Tutorials

Time Series Decomposition

A retail chain's monthly sales go up every December (seasonality), grow year-on-year (trend), and fluctuate around those patterns for unexplained reasons (residual). Decomposition splits the series into these three components so you can study each in isolation: smooth the trend, quantify how large the seasonal swings are, and examine whether the residuals show any remaining structure. Statsmodels provides two approaches: classical `seasonal_decompose` (fast, assumes a fixed seasonal shape), and STL (Seasonal-Trend decomposition using Loess), which is more robust to outliers and allows the seasonal pattern to evolve over time.

### Simulating a seasonal time series

Four years of monthly data with a linear trend, a repeating seasonal cycle, and random noise provides a realistic test case.

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
n = 48  # 4 years of monthly data
t = np.arange(n)

trend    = 50 + 0.8 * t
seasonal = 10 * np.sin(2 * np.pi * t / 12)
noise    = rng.normal(0, 3, n)
values   = trend + seasonal + noise

dates = pd.date_range("2020-01", periods=n, freq="MS")
ts = pd.Series(values, index=dates)

print(ts.describe().round(2))
print(f"\nFirst 3 values: {ts.iloc[:3].round(1).tolist()}")
count    48.00
mean     69.04
std      12.60
min      44.64
25%      59.04
50%      68.68
75%      80.18
max      93.15
dtype: float64

First 3 values: [50.9, 52.7, 62.5]
- `trend = 50 + 0.8 * t` adds a steady upward drift of 0.8 units per month.
- `10 * np.sin(2 * np.pi * t / 12)` creates a 12-month seasonal cycle with amplitude ±10 — the monthly pattern repeats identically each year.
- `freq="MS"` creates a monthly DatetimeIndex (month start). Statsmodels uses the index frequency to infer the seasonal period when `period` is not specified.

### Classical seasonal decomposition

`seasonal_decompose` uses a moving average to estimate trend, then averages residuals by season to estimate the seasonal component.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

rng = np.random.default_rng(42)
n = 48
t = np.arange(n)
ts = pd.Series(50 + 0.8 * t + 10 * np.sin(2 * np.pi * t / 12) + rng.normal(0, 3, n),
               index=pd.date_range("2020-01", periods=n, freq="MS"))

result = seasonal_decompose(ts, model="additive", period=12)

fig, axes = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
ts.plot(ax=axes[0], color="steelblue"); axes[0].set_ylabel("Observed")
result.trend.plot(ax=axes[1], color="tomato"); axes[1].set_ylabel("Trend")
result.seasonal.plot(ax=axes[2], color="green"); axes[2].set_ylabel("Seasonal")
result.resid.plot(ax=axes[3], color="gray"); axes[3].set_ylabel("Residual")
axes[0].set_title("Classical additive decomposition")
plt.tight_layout()
plt.show()
- `model="additive"` assumes observed = trend + seasonal + residual. Use `model="multiplicative"` when seasonal swings grow proportionally with the trend (common in sales data).
- The trend panel shows NaN at both ends — the moving average needs half a period's worth of data on each side, so the first and last 6 months are missing.
- The residual panel should look like white noise if the decomposition has captured all systematic structure. Any pattern in the residuals suggests the model is incomplete.

### STL decomposition

STL uses locally weighted regression (Loess) instead of a fixed moving average, making it robust to outliers and able to handle changing seasonal patterns.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL

rng = np.random.default_rng(42)
n = 48
t = np.arange(n)
ts = pd.Series(50 + 0.8 * t + 10 * np.sin(2 * np.pi * t / 12) + rng.normal(0, 3, n),
               index=pd.date_range("2020-01", periods=n, freq="MS"))

stl = STL(ts, period=12, robust=True)
result = stl.fit()

fig, axes = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
ts.plot(ax=axes[0], color="steelblue"); axes[0].set_ylabel("Observed")
result.trend.plot(ax=axes[1], color="tomato"); axes[1].set_ylabel("Trend")
result.seasonal.plot(ax=axes[2], color="green"); axes[2].set_ylabel("Seasonal")
result.resid.plot(ax=axes[3], color="gray"); axes[3].set_ylabel("Residual")
axes[0].set_title("STL decomposition")
plt.tight_layout()
plt.show()

# Seasonal and trend strength
var_r = result.resid.var()
f_s = max(0, 1 - var_r / (result.seasonal + result.resid).var())
f_t = max(0, 1 - var_r / (result.trend   + result.resid).var())
print(f"Seasonal strength: {f_s:.3f}")
print(f"Trend strength:    {f_t:.3f}")
Seasonal strength: 0.924
Trend strength:    0.968
- `robust=True` uses iteratively reweighted Loess to downweight outliers — standard classical decomposition would absorb outliers into the residuals and distort the trend.
- STL produces a complete trend (no NaN at the edges) because Loess can extrapolate at the boundaries.
- Seasonal strength = `1 − Var(R) / Var(S + R)` measures how much of the non-trend variation is explained by the seasonal component. Values near 1 indicate dominant seasonality.

### Conclusion

Decomposition is the first step in understanding a time series: quantify the trend slope, measure the seasonal amplitude, and check whether the residuals are random. Classical decomposition is fast and interpretable; STL is preferred when the data contains outliers or when the seasonal pattern may shift over time. Always inspect the residuals — any systematic pattern means the decomposition hasn't captured everything.

For modelling the autocorrelation structure of residuals after decomposition, see [autocorrelation and PACF plots](/tutorials/autocorrelation-and-pacf-plots). For forecasting future values with trend and seasonality, see [exponential smoothing with statsmodels](/tutorials/exponential-smoothing-statsmodels).