Logistic Regression in Python with Statsmodels

Many practical modeling tasks are binary: convert or not, churn or stay, default or repay. Logistic regression is built for these outcomes and keeps interpretation straightforward.

This tutorial focuses on **statsmodels logistic regression** with **statsmodels Logit**.

## Logistic regression vs linear regression

Linear regression predicts unbounded numeric values. Logistic regression predicts probabilities between `0` and `1` using a logistic curve, then maps those probabilities to classes using a threshold.

## Preparing binary target data

import requests
import pandas as pd

# Download once and persist locally for later blocks
url = "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/mtcars.csv"
response = requests.get(url, timeout=30)
response.raise_for_status()

with open("mtcars.csv", "w", encoding="utf-8") as f:
    f.write(response.text)

# Binary target and feature matrix
df = pd.read_csv("mtcars.csv")
y = df["am"]
X = df[["mpg", "hp", "wt"]]

print(X.head())
print(y.head())
    mpg   hp     wt
0  21.0  110  2.620
1  21.0  110  2.875
2  22.8   93  2.320
3  21.4  110  3.215
4  18.7  175  3.440
0    1
1    1
2    1
3    0
4    0
Name: am, dtype: int64
This block downloads and saves `mtcars.csv` once, then prepares the binary target and predictor matrix for classification. Defining `y` and `X` explicitly up front makes the modeling flow clearer and ensures the same dataset is reused consistently in later blocks.

## Adding intercept

import pandas as pd
import statsmodels.api as sm

# Add intercept for baseline log-odds
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["mpg", "hp", "wt"]])
y = df["am"]
print(X.head())
   const   mpg   hp     wt
0    1.0  21.0  110  2.620
1    1.0  21.0  110  2.875
2    1.0  22.8   93  2.320
3    1.0  21.4  110  3.215
4    1.0  18.7  175  3.440
`add_constant()` adds the intercept term required for a baseline log-odds estimate. Including an intercept lets the model represent the base probability level instead of forcing all predictors to explain it.

## Fitting the Logit model

import pandas as pd
import statsmodels.api as sm

# Fit logistic regression model
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["mpg", "hp", "wt"]])
y = df["am"]

logit_model = sm.Logit(y, X).fit(disp=False)
print(logit_model.summary())
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                     am   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Wed, 11 Mar 2026   Pseudo R-squ.:                  0.7972
Time:                        16:48:03   Log-Likelihood:                -4.3831
converged:                       True   LL-Null:                       -21.615
Covariance Type:            nonrobust   LLR p-value:                 1.581e-07
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -15.7214     40.003     -0.393      0.694     -94.125      62.683
mpg            1.2293      1.581      0.778      0.437      -1.870       4.328
hp             0.0839      0.082      1.020      0.308      -0.077       0.245
wt            -6.9549      3.353     -2.074      0.038     -13.527      -0.383
==============================================================================

Possibly complete quasi-separation: A fraction 0.25 of observations can be
perfectly predicted. This might indicate that there is complete
quasi-separation. In this case some parameters will not be identified.
This fits `Logit` and prints coefficient significance, confidence intervals, and fit statistics. The summary helps you decide which predictors carry useful classification signal before you move to prediction.

## Interpreting coefficients and odds ratios

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Train model then convert log-odds coefficients to odds ratios
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["mpg", "hp", "wt"]])
y = df["am"]
logit_model = sm.Logit(y, X).fit(disp=False)

coef_df = pd.DataFrame(
    {
        "coef": logit_model.params,
        "odds_ratio": np.exp(logit_model.params),
        "p_value": logit_model.pvalues,
    }
)
print(coef_df)
            coef    odds_ratio   p_value
const -15.721371  1.486947e-07  0.694315
mpg     1.229302  3.418843e+00  0.436861
hp      0.083893  1.087513e+00  0.307900
wt     -6.954924  9.539266e-04  0.038056
Logit coefficients are in log-odds space; exponentiating them gives odds ratios, which are easier to interpret operationally. Odds ratios let you explain model behavior in practical terms, such as how much odds change when a feature increases.

## Predicting probabilities and classification thresholds

import pandas as pd
import statsmodels.api as sm

# Fit model on training rows
df = pd.read_csv("mtcars.csv")
X = sm.add_constant(df[["mpg", "hp", "wt"]])
y = df["am"]
logit_model = sm.Logit(y, X).fit(disp=False)

new_cars = pd.DataFrame(
    {
        "mpg": [18.0, 28.0],
        "hp": [150, 95],
        "wt": [3.4, 2.1],
    }
)
new_cars = sm.add_constant(new_cars, has_constant="add")

# Predict probabilities, then map to classes at two thresholds
prob = logit_model.predict(new_cars)
class_at_05 = (prob >= 0.5).astype(int)
class_at_07 = (prob >= 0.7).astype(int)

print("Probabilities:")
print(prob)
print("Classes at threshold 0.5:")
print(class_at_05)
print("Classes at threshold 0.7:")
print(class_at_07)
Probabilities:
0    0.009409
1    0.999994
dtype: float64
Classes at threshold 0.5:
0    0
1    1
dtype: int64
Classes at threshold 0.7:
0    0
1    1
dtype: int64
This computes probabilities first and then shows how changing thresholds changes class assignments. Comparing thresholds demonstrates the precision/recall tradeoff you make when turning probabilities into hard yes/no decisions.