Linear regression answers the question: for every one-unit increase in X, how much does Y tend to change? The slope gives you that rate of change, the intercept gives you the predicted Y when X is zero, R-squared tells you what fraction of Y's variation is explained by X, and the p-value tests whether the slope is significantly different from zero. SciPy's `linregress` is the fastest way to get these for a simple two-variable relationship. For more complex models with multiple predictors, see [statsmodels linear regression](/tutorials/statsmodels-linear-regression). ### Basic Linear Regression `linregress` fits a least-squares line — it finds the slope and intercept that minimize the sum of squared vertical distances from each point to the line.
import numpy as np
from scipy import stats
np.random.seed(42)
x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))
result = stats.linregress(x, y)
print(f"Slope: {result.slope:.3f}")
print(f"Intercept: {result.intercept:.3f}")
print(f"R-squared: {result.rvalue**2:.3f}")
print(f"P-value: {result.pvalue:.6f}")- `result.slope` is the estimated change in Y per unit increase in X — here it should be close to the true value of 3.2. - `result.rvalue**2` is R-squared: a value of 1.0 means X perfectly predicts Y; 0.0 means X explains nothing. - `result.pvalue` tests the null hypothesis that the true slope is zero — a very small value means X and Y have a real linear relationship. ### Visualizing the Fitted Line Plotting the scatter of observations alongside the regression line shows whether the linear model is a reasonable description of the data or whether a curve would fit better.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(42)
x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))
result = stats.linregress(x, y)
plt.figure(figsize=(9, 5))
plt.scatter(x, y, alpha=0.6, label="Observed data")
plt.plot(x, result.slope * x + result.intercept, color="crimson", label="Fitted line")
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Simple Linear Regression")
plt.legend()
plt.grid(alpha=0.3)
plt.show()- `result.slope * x + result.intercept` computes predicted Y values across the full range of X — this is the regression line equation. - If points fan out as X increases (heteroscedasticity), or curve systematically away from the line, the linear model may not be the best choice. ### Making Predictions Once you have the slope and intercept, you can predict the Y value for any new X — even values not in the original dataset.
import numpy as np
from scipy import stats
np.random.seed(42)
x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))
result = stats.linregress(x, y)
x_new = 7.5
y_pred = result.slope * x_new + result.intercept
print(f"Predicted y at x = {x_new}: {y_pred:.2f}")- This is called extrapolation when `x_new` is outside the range of the training data — predictions become less reliable the further you extrapolate. - The predicted value is a point estimate. For a range that accounts for uncertainty, use a confidence interval for the slope (shown in the next section). ### Confidence Interval for the Slope The slope estimate will differ slightly each time you collect new data. A 95% confidence interval gives you a range of plausible values for the true slope based on this sample.
import numpy as np
from scipy import stats
np.random.seed(42)
x = np.linspace(0, 10, 80)
y = 3.2 * x + 4 + np.random.normal(0, 2, size=len(x))
result = stats.linregress(x, y)
df = len(x) - 2
t_crit = stats.t.ppf(0.975, df)
ci = (
result.slope - t_crit * result.stderr,
result.slope + t_crit * result.stderr,
)
print(f"Slope estimate: {result.slope:.3f}")
print(f"95% CI for slope: ({ci[0]:.3f}, {ci[1]:.3f})")- `df = len(x) - 2` is the degrees of freedom for simple linear regression — two parameters (slope and intercept) are estimated from the data. - `stats.t.ppf(0.975, df)` gives the t critical value for a two-sided 95% interval — `0.975` because 2.5% is in each tail. - `result.stderr` is the standard error of the slope estimate; a narrower interval means the slope is estimated more precisely. ### Practical Example: Advertising Spend and Sales This example fits a regression to simulated ad spend and sales data, then interprets the slope in business terms.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(8)
ad_spend = np.linspace(5, 50, 60)
sales = 1.8 * ad_spend + 20 + np.random.normal(0, 6, size=len(ad_spend))
result = stats.linregress(ad_spend, sales)
print(f"Slope: {result.slope:.3f}")
print(f"Intercept: {result.intercept:.3f}")
print(f"R-squared: {result.rvalue**2:.3f}")
print(f"P-value: {result.pvalue:.6f}")
plt.figure(figsize=(9, 5))
plt.scatter(ad_spend, sales, alpha=0.65)
plt.plot(ad_spend, result.slope * ad_spend + result.intercept, color="darkorange", linewidth=2)
plt.xlabel("Advertising spend")
plt.ylabel("Sales")
plt.title("Advertising Spend vs Sales")
plt.grid(alpha=0.3)
plt.show()- The slope here estimates how many additional sales are associated with each extra unit of ad spend — the real-world interpretation is always specific to the units of X and Y. - R-squared near 0.9 means advertising spend accounts for ~90% of the variation in sales in this simulated dataset — real marketing data typically shows much weaker relationships. - The p-value near zero confirms the slope is significantly different from zero, meaning ad spend is a meaningful predictor here. ### Conclusion SciPy's `linregress` gives you the essentials of simple linear regression in one call: slope, intercept, R-squared, and significance. It's best suited to two-variable relationships — if you have multiple predictors or need diagnostic tools, see [statsmodels linear regression](/tutorials/statsmodels-linear-regression). To measure the strength of a linear relationship without fitting a model, see [correlation analysis with SciPy](/tutorials/correlation-analysis-with-scipy).