SARIMAX Time Series Forecasting with Statsmodels

Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. SARIMA models capture the four key features of a time series in one framework: trend (a long-run upward or downward drift), autoregression (today's value depends on recent past values), moving-average errors (today's value depends on recent forecast errors), and seasonality (a repeating pattern at a fixed period like 12 months or 7 days). The `SARIMAX` class in statsmodels handles all of this — "S" adds seasonal terms on top of the base [ARIMA model](/tutorials/statsmodels-arima-models), and "X" allows optional exogenous (external) predictors. For monthly data like airline passenger counts, seasonality is the dominant feature and SARIMAX captures it directly.

## Preparing time series data

The classic airline passengers dataset has monthly counts from 1949 to 1960 — a clear upward trend with strong annual seasonality. Setting the frequency explicitly tells statsmodels how to interpret forecast steps.

import requests
import pandas as pd

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()

with open("airline-passengers.csv", "w", encoding="utf-8") as f:
    f.write(response.text)

df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

print(series.head())
print(series.index.freq)

Month
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
1949-05-01    121
Freq: MS, Name: Passengers, dtype: int64
<MonthBegin>

- `pd.to_datetime(df["Month"])` converts the string dates to proper datetime objects so they can be used as a time index.
- `.asfreq("MS")` sets the frequency to "Month Start" — without an explicit frequency, statsmodels can't determine the spacing between observations and may warn or fail during forecasting.
- Saving locally once means all later blocks avoid repeated network calls.

## Defining SARIMAX parameters

A SARIMAX model is defined by two tuples: the non-seasonal order `(p, d, q)` and the seasonal order `(P, D, Q, s)`. Understanding what each component does helps you choose sensible values.

import pandas as pd

df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)

print("order:", order)
print("seasonal_order:", seasonal_order)

order: (1, 1, 1)
seasonal_order: (1, 1, 1, 12)

- `order=(p, d, q)`: `p` is the number of autoregressive lags (how many past values to use), `d` is the number of differences to apply (1 removes a linear trend), `q` is the number of moving-average lags.
- `seasonal_order=(P, D, Q, s)`: the same three parameters applied at the seasonal level, with `s=12` meaning the pattern repeats every 12 months.
- `(1, 1, 1, 12)` is a common starting point for monthly seasonal data — one seasonal difference removes the growing seasonal amplitude, and seasonal AR/MA terms capture the residual pattern.

## Fitting the model

Fitting SARIMAX requires passing both order tuples and a couple of stability flags that prevent numerical issues with certain parameter combinations.

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

model = SARIMAX(
    series,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary())

                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                         Passengers   No. Observations:                  144
Model:             SARIMAX(1, 1, 1)x(1, 1, 1, 12)   Log Likelihood                -456.103
Date:                            Fri, 10 Apr 2026   AIC                            922.205
Time:                                    12:30:08   BIC                            936.016
Sample:                                01-01-1949   HQIC                           927.812
                                     - 12-01-1960                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.2298      0.401     -0.573      0.567      -1.016       0.557
ma.L1         -0.0987      0.374     -0.264      0.792      -0.832       0.634
ar.S.L12      -0.5460      0.299     -1.825      0.068      -1.133       0.041
ma.S.L12       0.3959      0.352      1.125      0.261      -0.294       1.086
sigma2       140.2945     17.997      7.795      0.000     105.020     175.569
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 5.42
Prob(Q):                              0.95   Prob(JB):                         0.07
Heteroskedasticity (H):               2.51   Skew:                             0.12
Prob(H) (two-sided):                  0.01   Kurtosis:                         4.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

- `enforce_stationarity=False` and `enforce_invertibility=False` allow the optimizer to explore the full parameter space without numerical constraints — useful when starting parameters are uncertain.
- `disp=False` suppresses the optimizer's iteration output, leaving only the final summary.
- The summary shows AIC and BIC (lower is better for model comparison), log-likelihood, and coefficient estimates with p-values for each AR/MA term.

## Forecasting and visualization

A train-test split lets you evaluate forecast quality on held-out data before using the model for real future forecasts.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX

df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

train = series.iloc[:-12]
test = series.iloc[-12:]

model = SARIMAX(
    train,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)

forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index

plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
    forecast_df.index,
    forecast_df["mean_ci_lower"],
    forecast_df["mean_ci_upper"],
    color="blue",
    alpha=0.2,
    label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()

print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])

Passengers        mean  mean_ci_lower  mean_ci_upper
Month                                               
1960-01-01  423.220775     402.813162     443.628389
1960-02-01  406.433566     380.708655     432.158478
1960-03-01  467.547433     436.021002     499.073865
1960-04-01  457.478940     421.736394     493.221486
1960-05-01  480.937601     441.101984     520.773218
1960-06-01  534.599304     491.215120     577.983488
1960-07-01  609.414970     562.670581     656.159359
1960-08-01  621.009678     571.171892     670.847464
1960-09-01  523.363515     470.592559     576.134470
1960-10-01  468.643694     413.104946     524.182443
1960-11-01  423.339480     365.158953     481.520006
1960-12-01  465.073605     404.369009     525.778201

- `series.iloc[:-12]` and `series.iloc[-12:]` split the last 12 months as a holdout — fitting on `train` only means the model never sees the test period.
- `get_forecast(steps=12)` forecasts 12 steps ahead; `.summary_frame(alpha=0.05)` returns the point forecast and 95% confidence interval bounds in a DataFrame.
- `fill_between` shades the confidence interval — widening bands toward the end of the forecast period is expected and reflects compounding uncertainty.
- Visually comparing the blue forecast line to the black actual line shows whether the model captures the seasonal pattern and level correctly.

### Conclusion

SARIMAX is the standard tool for monthly or quarterly data with both trend and seasonality. Fitting on a training set and comparing to a holdout is the most reliable way to evaluate whether the chosen order actually generalizes.

For non-seasonal or daily time series, see [ARIMA models with Statsmodels](/tutorials/statsmodels-arima-models). For simpler linear relationships over time, [linear regression with Statsmodels](/tutorials/statsmodels-linear-regression) with a time variable as a predictor can also work.