Tutorials

Bootstrap Confidence Intervals with SciPy

A confidence interval tells you the range of plausible values for a population parameter given your sample. The classical approach (like a t-interval for the mean) uses a formula that assumes normally distributed data. Bootstrap confidence intervals take a different approach: they resample the data with replacement thousands of times, compute the statistic on each resample, and use the spread of those resampled values to estimate uncertainty. This works for *any* statistic — medians, ratios, correlation coefficients, model parameters — without needing a closed-form formula or normality assumption. It's especially valuable for small or non-normal samples.

### Bootstrapping the Mean

`scipy.stats.bootstrap` handles the resampling loop for you. You pass the data, the statistic function, and the number of resamples.

import numpy as np
from scipy import stats

np.random.seed(22)

data = np.random.normal(loc=50, scale=8, size=80)

bootstrap_result = stats.bootstrap((data,), np.mean, confidence_level=0.95, n_resamples=5000, random_state=22)

print(f"Sample mean: {data.mean():.3f}")
print(f"95% bootstrap CI: ({bootstrap_result.confidence_interval.low:.3f}, {bootstrap_result.confidence_interval.high:.3f})")
Sample mean: 49.519
95% bootstrap CI: (47.856, 51.332)
- `(data,)` is passed as a tuple because `bootstrap` accepts multiple datasets for statistics that take more than one sample (like correlation).
- `n_resamples=5000` means 5000 resamples are drawn — more resamples give a more stable interval at the cost of computation time.
- `bootstrap_result.confidence_interval.low` and `.high` are the lower and upper bounds of the interval.

### Comparing Bootstrap and t-Based Intervals

For normally distributed data with a reasonable sample size, bootstrap and t-based intervals should be very close. The value of bootstrap shows up more clearly with small or non-normal samples.

import numpy as np
from scipy import stats

np.random.seed(22)

data = np.random.normal(loc=50, scale=8, size=80)

bootstrap_result = stats.bootstrap((data,), np.mean, confidence_level=0.95, n_resamples=5000, random_state=22)
t_interval = stats.t.interval(
    confidence=0.95,
    df=len(data) - 1,
    loc=data.mean(),
    scale=stats.sem(data),
)

print(f"Bootstrap CI: ({bootstrap_result.confidence_interval.low:.3f}, {bootstrap_result.confidence_interval.high:.3f})")
print(f"t-based CI:   ({t_interval[0]:.3f}, {t_interval[1]:.3f})")
Bootstrap CI: (47.856, 51.332)
t-based CI:   (47.734, 51.304)
- `stats.sem(data)` computes the standard error of the mean, which the t-interval formula needs.
- Close agreement between the two intervals here is expected — the data are normally distributed and the sample is large enough for the t-interval formula to work well.
- If you tried this with skewed data (e.g., `np.random.exponential`), the two intervals would diverge more, and the bootstrap interval would generally be the more trustworthy one.

### Visualizing the Bootstrap Distribution

Plotting the bootstrap distribution of the statistic shows the full shape of the uncertainty — not just the interval endpoints — and helps you see whether the distribution is symmetric or skewed.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(22)

data = np.random.normal(loc=50, scale=8, size=80)
boot_means = []
for _ in range(4000):
    sample = np.random.choice(data, size=len(data), replace=True)
    boot_means.append(sample.mean())
boot_means = np.array(boot_means)

ci_low, ci_high = np.percentile(boot_means, [2.5, 97.5])

plt.figure(figsize=(9, 5))
plt.hist(boot_means, bins=35, alpha=0.75)
plt.axvline(data.mean(), color="red", linestyle="--", label="Sample mean")
plt.axvline(ci_low, color="green", linestyle="--", label="95% CI")
plt.axvline(ci_high, color="green", linestyle="--")
plt.title("Bootstrap Distribution of the Mean")
plt.xlabel("Mean")
plt.ylabel("Count")
plt.legend()
plt.show()
- `np.random.choice(data, size=len(data), replace=True)` draws a resample the same size as the original, with replacement — some observations will appear multiple times, others not at all.
- `np.percentile(boot_means, [2.5, 97.5])` cuts off the bottom and top 2.5% of resampled means to form the 95% percentile interval — this is the simplest bootstrap interval method.
- A symmetric, bell-shaped bootstrap distribution (as expected here) confirms the t-interval formula would also work well.

### Practical Example: Average Delivery Time

Bootstrap intervals are useful whenever you want to report uncertainty without making strong distributional assumptions — for example, estimating average delivery time from operational data.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(37)

delivery_times = np.random.normal(loc=32, scale=4.5, size=60)
bootstrap_result = stats.bootstrap((delivery_times,), np.mean, confidence_level=0.95, n_resamples=4000, random_state=37)

print(f"Average delivery time: {delivery_times.mean():.2f} minutes")
print(
    "95% bootstrap CI: "
    f"({bootstrap_result.confidence_interval.low:.2f}, {bootstrap_result.confidence_interval.high:.2f}) minutes"
)

plt.figure(figsize=(8, 5))
plt.hist(delivery_times, bins=15, alpha=0.7)
plt.axvline(delivery_times.mean(), color="crimson", linestyle="--", linewidth=2)
plt.title("Observed Delivery Times")
plt.xlabel("Minutes")
plt.ylabel("Count")
plt.show()
Average delivery time: 32.58 minutes
95% bootstrap CI: (31.38, 33.74) minutes
- The CI here says: given this sample, the true average delivery time is likely within a certain range with 95% confidence.
- The histogram helps communicate the spread of raw delivery times, which is different from the uncertainty in the *mean* — the CI answers "how precisely do we know the average?", not "how spread out are the deliveries?".

### Conclusion

Bootstrap confidence intervals are one of the most broadly applicable tools in statistics — they require minimal assumptions and work for statistics that have no closed-form interval. SciPy's `bootstrap` function makes them easy to apply to any statistic.

For a related approach to measuring uncertainty in regression slope estimates, see [linear regression with SciPy](/tutorials/linear-regression-with-scipy). For testing whether two groups differ without normality assumptions, see the [Mann-Whitney U test](/tutorials/mann-whitney-u-test-with-scipy).