Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. SARIMA and SARIMAX models are popular because they capture trend, autoregression, moving-average effects, and seasonality in one model family. `SARIMA` models univariate seasonal time series, while `SARIMAX` extends this framework and can also include exogenous predictors (`X`) when needed. ## Preparing time series data
import requests
import pandas as pd
# Download once and persist locally for the rest of the tutorial
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("airline-passengers.csv", "w", encoding="utf-8") as f:
f.write(response.text)
# Parse dates and set a monthly index with explicit frequency
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
print(series.head())
print(series.index.freq)Month 1949-01-01 112 1949-02-01 118 1949-03-01 132 1949-04-01 129 1949-05-01 121 Freq: MS, Name: Passengers, dtype: int64 <MonthBegin>
This block downloads and saves `airline-passengers.csv` once, then prepares the monthly time series index used in later examples. Setting an explicit monthly frequency is important so statsmodels handles forecast steps on the right calendar spacing. ## Defining SARIMAX parameters
import pandas as pd
# Reload prepared series and define non-seasonal + seasonal orders
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)
print("order:", order)
print("seasonal_order:", seasonal_order)order: (1, 1, 1) seasonal_order: (1, 1, 1, 12)
`order=(p,d,q)` controls non-seasonal ARIMA terms, and `seasonal_order=(P,D,Q,s)` controls seasonal terms with seasonal period `s=12` for monthly data. Stating these values explicitly makes model structure clear before fitting. ## Fitting the model
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Build and fit SARIMAX on the full historical series
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
model = SARIMAX(
series,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary()) SARIMAX Results
==========================================================================================
Dep. Variable: Passengers No. Observations: 144
Model: SARIMAX(1, 1, 1)x(1, 1, 1, 12) Log Likelihood -456.103
Date: Wed, 11 Mar 2026 AIC 922.205
Time: 16:47:56 BIC 936.016
Sample: 01-01-1949 HQIC 927.812
- 12-01-1960
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 -0.2298 0.401 -0.573 0.567 -1.016 0.557
ma.L1 -0.0987 0.374 -0.264 0.792 -0.832 0.634
ar.S.L12 -0.5460 0.299 -1.825 0.068 -1.133 0.041
ma.S.L12 0.3959 0.352 1.125 0.261 -0.294 1.086
sigma2 140.2945 17.997 7.795 0.000 105.020 175.569
===================================================================================
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 5.42
Prob(Q): 0.95 Prob(JB): 0.07
Heteroskedasticity (H): 2.51 Skew: 0.12
Prob(H) (two-sided): 0.01 Kurtosis: 4.03
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
This fits the SARIMAX model and prints coefficient estimates and fit diagnostics for interpretation. Reviewing this summary helps you validate whether the chosen order looks reasonable before using forecasts operationally. ## Forecasting and visualization
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Split into train/test so forecast quality can be checked on unseen data
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
train = series.iloc[:-12]
test = series.iloc[-12:]
model = SARIMAX(
train,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
# Forecast the holdout window with 95% confidence intervals
forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index
plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
forecast_df.index,
forecast_df["mean_ci_lower"],
forecast_df["mean_ci_upper"],
color="blue",
alpha=0.2,
label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()
print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])Passengers mean mean_ci_lower mean_ci_upper Month 1960-01-01 423.220775 402.813162 443.628389 1960-02-01 406.433566 380.708655 432.158478 1960-03-01 467.547433 436.021002 499.073865 1960-04-01 457.478940 421.736394 493.221486 1960-05-01 480.937601 441.101984 520.773218 1960-06-01 534.599304 491.215120 577.983488 1960-07-01 609.414970 562.670581 656.159359 1960-08-01 621.009678 571.171892 670.847464 1960-09-01 523.363515 470.592559 576.134470 1960-10-01 468.643694 413.104946 524.182443 1960-11-01 423.339480 365.158953 481.520006 1960-12-01 465.073605 404.369009 525.778201
This block performs a train-test forecast, plots predictions against actual values, and includes confidence intervals for uncertainty-aware planning. Comparing forecast and actual curves is a practical check of whether the model is good enough for decision-making.