Most real outcomes depend on several factors at once. **Multiple linear regression** helps you estimate how each predictor relates to the target while controlling for other predictors. In this tutorial, you will build a model with multiple predictors and interpret partial effects. ## Preparing multiple predictors
import requests
import pandas as pd
# Download once so every later example can reuse the same data file
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()
# Save local CSV copy for the rest of this tutorial
with open("mtcars.csv", "w", encoding="utf-8") as f:
f.write(response.text)
df = pd.read_csv("mtcars.csv")
features = ["hp", "wt", "qsec"]
# X contains multiple predictors; y is the target to predict
X = df[features]
y = df["mpg"]
print(X.head())
print(y.head())hp wt qsec 0 110 2.620 16.46 1 110 2.875 17.02 2 93 2.320 18.61 3 110 3.215 19.44 4 175 3.440 17.02 0 21.0 1 21.0 2 22.8 3 21.4 4 18.7 Name: mpg, dtype: float64
This code downloads and saves `mtcars.csv` once, then creates a feature matrix (`X`) with multiple columns and a target vector (`y`). Splitting predictors and target clearly at the start makes later model code easier to reason about and helps prevent accidental leakage of target information into features. ## Building the feature matrix and adding intercept
import pandas as pd
import statsmodels.api as sm
# Load source data and build model matrix with intercept
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
print(X.head())const hp wt qsec 0 1.0 110 2.620 16.46 1 1.0 110 2.875 17.02 2 1.0 93 2.320 18.61 3 1.0 110 3.215 19.44 4 1.0 175 3.440 17.02
The added constant gives the model an intercept term, while each feature column contributes to the regression line in higher-dimensional space. This setup is required for OLS to estimate each coefficient as a partial effect while accounting for other predictors. ## Fitting the regression model
import pandas as pd
import statsmodels.api as sm
# Fit OLS with three predictors at once
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
model = sm.OLS(y, X).fit()
print(model.summary()) OLS Regression Results
==============================================================================
Dep. Variable: mpg R-squared: 0.835
Model: OLS Adj. R-squared: 0.817
Method: Least Squares F-statistic: 47.15
Date: Wed, 11 Mar 2026 Prob (F-statistic): 4.51e-11
Time: 16:48:01 Log-Likelihood: -73.571
No. Observations: 32 AIC: 155.1
Df Residuals: 28 BIC: 161.0
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 27.6105 8.420 3.279 0.003 10.363 44.858
hp -0.0178 0.015 -1.190 0.244 -0.049 0.013
wt -4.3588 0.753 -5.791 0.000 -5.901 -2.817
qsec 0.5108 0.439 1.163 0.255 -0.389 1.411
==============================================================================
Omnibus: 4.495 Durbin-Watson: 1.422
Prob(Omnibus): 0.106 Jarque-Bera (JB): 3.368
Skew: 0.786 Prob(JB): 0.186
Kurtosis: 3.230 Cond. No. 3.00e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
This fits the full multiple linear regression model and prints inference statistics for each predictor. Looking at the full summary lets you evaluate both overall model quality and whether individual variables contribute useful signal. ## Interpreting coefficients and significance
import pandas as pd
import statsmodels.api as sm
# Refit model and extract key inference fields into one table
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
y = df["mpg"]
model = sm.OLS(y, X).fit()
summary_table = pd.DataFrame(
{
"coefficient": model.params,
"p_value": model.pvalues,
"ci_lower": model.conf_int()[0],
"ci_upper": model.conf_int()[1],
}
)
print(summary_table)coefficient p_value ci_lower ci_upper const 27.610527 0.002785 10.363085 44.857968 hp -0.017822 0.244176 -0.048510 0.012865 wt -4.358797 0.000003 -5.900634 -2.816960 qsec 0.510834 0.254628 -0.388871 1.410538
Each coefficient is a **partial effect**: it represents the expected change in `mpg` from a one-unit increase in that feature while holding other predictors fixed. This interpretation is the main reason to use multiple regression, since it separates the contribution of correlated real-world factors. ## Multicollinearity discussion with VIF When predictors are strongly correlated with each other, coefficient estimates can become unstable. Variance Inflation Factor (VIF) helps detect this.
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Compute VIF on the same design matrix used for modeling
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["hp", "wt", "qsec"]])
vif = pd.DataFrame(
{
"feature": X.columns,
"VIF": [variance_inflation_factor(X.values, i) for i in range(X.shape[1])],
}
)
print(vif)feature VIF 0 const 341.407799 1 hp 4.921956 2 wt 2.530443 3 qsec 2.873804
This block calculates VIF values for each predictor so you can identify potential multicollinearity before interpreting coefficients. Checking VIF first helps you avoid overconfident conclusions from unstable coefficient estimates when predictors overlap heavily.