Tutorials

Linear Regression with SciPy

Simple linear regression models the relationship between one predictor variable and one response variable. SciPy provides a direct way to estimate the slope, intercept, correlation strength, and significance of that relationship.

### Basic Linear Regression

import numpy as np
from scipy import stats

np.random.seed(42)

x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))

result = stats.linregress(x, y)

print(f"Slope: {result.slope:.3f}")
print(f"Intercept: {result.intercept:.3f}")
print(f"R-squared: {result.rvalue**2:.3f}")
print(f"P-value: {result.pvalue:.6f}")
Slope: 3.219
Intercept: 3.658
R-squared: 0.961
P-value: 0.000000
### Visualizing the Fitted Line

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(42)

x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))

result = stats.linregress(x, y)

plt.figure(figsize=(9, 5))
plt.scatter(x, y, alpha=0.6, label="Observed data")
plt.plot(x, result.slope * x + result.intercept, color="crimson", label="Fitted line")
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Simple Linear Regression")
plt.legend()
plt.grid(alpha=0.3)
plt.show()
### Making Predictions

import numpy as np
from scipy import stats

np.random.seed(42)

x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))

result = stats.linregress(x, y)
x_new = 7.5
y_pred = result.slope * x_new + result.intercept

print(f"Predicted y at x = {x_new}: {y_pred:.2f}")
Predicted y at x = 7.5: 27.80
### Confidence Interval for the Slope

import numpy as np
from scipy import stats

np.random.seed(42)

x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))

result = stats.linregress(x, y)
df = len(x) - 2
t_crit = stats.t.ppf(0.975, df)
ci = (
    result.slope - t_crit * result.stderr,
    result.slope + t_crit * result.stderr,
)

print(f"Slope estimate: {result.slope:.3f}")
print(f"95% CI for slope: ({ci[0]:.3f}, {ci[1]:.3f})")
Slope estimate: 3.219
95% CI for slope: (3.072, 3.366)
### Practical Example: Advertising Spend and Sales

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(8)

ad_spend = np.linspace(5, 50, 60)
sales = 1.8 * ad_spend + 20 + np.random.normal(0, 6, size=len(ad_spend))

result = stats.linregress(ad_spend, sales)

print(f"Slope: {result.slope:.3f}")
print(f"Intercept: {result.intercept:.3f}")
print(f"R-squared: {result.rvalue**2:.3f}")
print(f"P-value: {result.pvalue:.6f}")

plt.figure(figsize=(9, 5))
plt.scatter(ad_spend, sales, alpha=0.65)
plt.plot(ad_spend, result.slope * ad_spend + result.intercept, color="darkorange", linewidth=2)
plt.xlabel("Advertising spend")
plt.ylabel("Sales")
plt.title("Advertising Spend vs Sales")
plt.grid(alpha=0.3)
plt.show()
Slope: 1.747
Intercept: 21.253
R-squared: 0.918
P-value: 0.000000
### Conclusion

SciPy's `linregress` gives you a fast way to estimate and interpret a simple linear relationship. The fitted line, R-squared, and p-value together provide a clear summary of the association.