SARIMAX Time Series Forecasting with Statsmodels

Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. SARIMA and SARIMAX models are popular because they capture trend, autoregression, moving-average effects, and seasonality in one model family.

`SARIMA` models univariate seasonal time series, while `SARIMAX` extends this framework and can also include exogenous predictors (`X`) when needed.

## Preparing time series data

import requests
import pandas as pd

# Download once and persist locally for the rest of the tutorial
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()

with open("airline-passengers.csv", "w", encoding="utf-8") as f:
    f.write(response.text)

# Parse dates and set a monthly index with explicit frequency
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

print(series.head())
print(series.index.freq)
Month
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
1949-05-01    121
Freq: MS, Name: Passengers, dtype: int64
<MonthBegin>
This block downloads and saves `airline-passengers.csv` once, then prepares the monthly time series index used in later examples. Setting an explicit monthly frequency is important so statsmodels handles forecast steps on the right calendar spacing.

## Defining SARIMAX parameters

import pandas as pd

# Reload prepared series and define non-seasonal + seasonal orders
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)

print("order:", order)
print("seasonal_order:", seasonal_order)
order: (1, 1, 1)
seasonal_order: (1, 1, 1, 12)
`order=(p,d,q)` controls non-seasonal ARIMA terms, and `seasonal_order=(P,D,Q,s)` controls seasonal terms with seasonal period `s=12` for monthly data. Stating these values explicitly makes model structure clear before fitting.

## Fitting the model

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Build and fit SARIMAX on the full historical series
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

model = SARIMAX(
    series,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary())
                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                         Passengers   No. Observations:                  144
Model:             SARIMAX(1, 1, 1)x(1, 1, 1, 12)   Log Likelihood                -456.103
Date:                            Wed, 11 Mar 2026   AIC                            922.205
Time:                                    16:47:56   BIC                            936.016
Sample:                                01-01-1949   HQIC                           927.812
                                     - 12-01-1960                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.2298      0.401     -0.573      0.567      -1.016       0.557
ma.L1         -0.0987      0.374     -0.264      0.792      -0.832       0.634
ar.S.L12      -0.5460      0.299     -1.825      0.068      -1.133       0.041
ma.S.L12       0.3959      0.352      1.125      0.261      -0.294       1.086
sigma2       140.2945     17.997      7.795      0.000     105.020     175.569
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):                 5.42
Prob(Q):                              0.95   Prob(JB):                         0.07
Heteroskedasticity (H):               2.51   Skew:                             0.12
Prob(H) (two-sided):                  0.01   Kurtosis:                         4.03
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
This fits the SARIMAX model and prints coefficient estimates and fit diagnostics for interpretation. Reviewing this summary helps you validate whether the chosen order looks reasonable before using forecasts operationally.

## Forecasting and visualization

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Split into train/test so forecast quality can be checked on unseen data
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")

train = series.iloc[:-12]
test = series.iloc[-12:]

model = SARIMAX(
    train,
    order=(1, 1, 1),
    seasonal_order=(1, 1, 1, 12),
    enforce_stationarity=False,
    enforce_invertibility=False,
)
results = model.fit(disp=False)

# Forecast the holdout window with 95% confidence intervals
forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index

plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
    forecast_df.index,
    forecast_df["mean_ci_lower"],
    forecast_df["mean_ci_upper"],
    color="blue",
    alpha=0.2,
    label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()

print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])
Passengers        mean  mean_ci_lower  mean_ci_upper
Month                                               
1960-01-01  423.220775     402.813162     443.628389
1960-02-01  406.433566     380.708655     432.158478
1960-03-01  467.547433     436.021002     499.073865
1960-04-01  457.478940     421.736394     493.221486
1960-05-01  480.937601     441.101984     520.773218
1960-06-01  534.599304     491.215120     577.983488
1960-07-01  609.414970     562.670581     656.159359
1960-08-01  621.009678     571.171892     670.847464
1960-09-01  523.363515     470.592559     576.134470
1960-10-01  468.643694     413.104946     524.182443
1960-11-01  423.339480     365.158953     481.520006
1960-12-01  465.073605     404.369009     525.778201
This block performs a train-test forecast, plots predictions against actual values, and includes confidence intervals for uncertainty-aware planning. Comparing forecast and actual curves is a practical check of whether the model is good enough for decision-making.