Forecasting is useful when you need to plan inventory, staffing, or budgets ahead of demand changes. SARIMA models capture the four key features of a time series in one framework: trend (a long-run upward or downward drift), autoregression (today's value depends on recent past values), moving-average errors (today's value depends on recent forecast errors), and seasonality (a repeating pattern at a fixed period like 12 months or 7 days). The `SARIMAX` class in statsmodels handles all of this — "S" adds seasonal terms on top of the base [ARIMA model](/tutorials/statsmodels-arima-models), and "X" allows optional exogenous (external) predictors. For monthly data like airline passenger counts, seasonality is the dominant feature and SARIMAX captures it directly. ## Preparing time series data The classic airline passengers dataset has monthly counts from 1949 to 1960 — a clear upward trend with strong annual seasonality. Setting the frequency explicitly tells statsmodels how to interpret forecast steps.
import requests
import pandas as pd
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("airline-passengers.csv", "w", encoding="utf-8") as f:
f.write(response.text)
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
print(series.head())
print(series.index.freq)
- `pd.to_datetime(df["Month"])` converts the string dates to proper datetime objects so they can be used as a time index.
- `.asfreq("MS")` sets the frequency to "Month Start" — without an explicit frequency, statsmodels can't determine the spacing between observations and may warn or fail during forecasting.
- Saving locally once means all later blocks avoid repeated network calls.
## Defining SARIMAX parameters
A SARIMAX model is defined by two tuples: the non-seasonal order `(p, d, q)` and the seasonal order `(P, D, Q, s)`. Understanding what each component does helps you choose sensible values.
import pandas as pd
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
order = (1, 1, 1)
seasonal_order = (1, 1, 1, 12)
print("order:", order)
print("seasonal_order:", seasonal_order)- `order=(p, d, q)`: `p` is the number of autoregressive lags (how many past values to use), `d` is the number of differences to apply (1 removes a linear trend), `q` is the number of moving-average lags. - `seasonal_order=(P, D, Q, s)`: the same three parameters applied at the seasonal level, with `s=12` meaning the pattern repeats every 12 months. - `(1, 1, 1, 12)` is a common starting point for monthly seasonal data — one seasonal difference removes the growing seasonal amplitude, and seasonal AR/MA terms capture the residual pattern. ## Fitting the model Fitting SARIMAX requires passing both order tuples and a couple of stability flags that prevent numerical issues with certain parameter combinations.
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
model = SARIMAX(
series,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
print(results.summary())- `enforce_stationarity=False` and `enforce_invertibility=False` allow the optimizer to explore the full parameter space without numerical constraints — useful when starting parameters are uncertain. - `disp=False` suppresses the optimizer's iteration output, leaving only the final summary. - The summary shows AIC and BIC (lower is better for model comparison), log-likelihood, and coefficient estimates with p-values for each AR/MA term. ## Forecasting and visualization A train-test split lets you evaluate forecast quality on held-out data before using the model for real future forecasts.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
df = pd.read_csv("airline-passengers.csv")
df["Month"] = pd.to_datetime(df["Month"])
df = df.set_index("Month")
series = df["Passengers"].asfreq("MS")
train = series.iloc[:-12]
test = series.iloc[-12:]
model = SARIMAX(
train,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False,
)
results = model.fit(disp=False)
forecast = results.get_forecast(steps=12)
forecast_df = forecast.summary_frame(alpha=0.05)
forecast_df.index = test.index
plt.figure(figsize=(10, 5))
plt.plot(train.index, train.values, label="Train")
plt.plot(test.index, test.values, label="Actual", color="black")
plt.plot(forecast_df.index, forecast_df["mean"], label="Forecast", color="blue")
plt.fill_between(
forecast_df.index,
forecast_df["mean_ci_lower"],
forecast_df["mean_ci_upper"],
color="blue",
alpha=0.2,
label="95% CI",
)
plt.title("SARIMAX Forecast vs Actual")
plt.xlabel("Month")
plt.ylabel("Passengers")
plt.legend()
plt.tight_layout()
plt.show()
print(forecast_df[["mean", "mean_ci_lower", "mean_ci_upper"]])- `series.iloc[:-12]` and `series.iloc[-12:]` split the last 12 months as a holdout — fitting on `train` only means the model never sees the test period. - `get_forecast(steps=12)` forecasts 12 steps ahead; `.summary_frame(alpha=0.05)` returns the point forecast and 95% confidence interval bounds in a DataFrame. - `fill_between` shades the confidence interval — widening bands toward the end of the forecast period is expected and reflects compounding uncertainty. - Visually comparing the blue forecast line to the black actual line shows whether the model captures the seasonal pattern and level correctly. ### Conclusion SARIMAX is the standard tool for monthly or quarterly data with both trend and seasonality. Fitting on a training set and comparing to a holdout is the most reliable way to evaluate whether the chosen order actually generalizes. For non-seasonal or daily time series, see [ARIMA models with Statsmodels](/tutorials/statsmodels-arima-models). For simpler linear relationships over time, [linear regression with Statsmodels](/tutorials/statsmodels-linear-regression) with a time variable as a predictor can also work.