Tutorials

Change Point Detection

A factory sensor reading a steady 50°C suddenly jumps to 65°C — something changed. A website's daily page views plateau for months then abruptly drop. Change point detection finds these moments automatically, without requiring you to specify when to look. Unlike anomaly detection (which flags individual outlier points), change point detection finds *persistent* shifts where the underlying process has changed. The CUSUM (Cumulative Sum Control Chart) algorithm is one of the oldest and most reliable approaches: it accumulates deviations from a reference value and signals when the cumulation exceeds a threshold, indicating a sustained shift rather than a transient spike.

### Simulating a signal with known change points

A 300-point signal with two known mean shifts — one upward at position 100 and one downward at position 200 — provides a ground truth to validate the detection algorithm.

import numpy as np

rng = np.random.default_rng(42)
n = 300

signal = np.concatenate([
    rng.normal(10, 1, 100),   # segment 1: mean = 10
    rng.normal(14, 1, 100),   # segment 2: mean = 14 (shift up by 4)
    rng.normal(11, 1, 100),   # segment 3: mean = 11 (shift down)
])

print(f"Segment 1 mean: {signal[:100].mean():.2f}")
print(f"Segment 2 mean: {signal[100:200].mean():.2f}")
print(f"Segment 3 mean: {signal[200:].mean():.2f}")
print(f"True change points: 100, 200")
Segment 1 mean: 9.95
Segment 2 mean: 13.99
Segment 3 mean: 10.94
True change points: 100, 200
- Three segments with different means (10, 14, 11) create two change points at positions 100 and 200.
- The noise standard deviation (1.0) is much smaller than the shift sizes (4 and 3), making the changes clearly detectable — real-world data often has larger noise relative to the shift.
- `np.concatenate` joins the three segments seamlessly, producing a single 300-point array with two abrupt transitions.

### CUSUM algorithm for mean-shift detection

CUSUM accumulates positive and negative deviations from a baseline. When either accumulator exceeds a threshold, a change point is declared and the accumulator resets.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
signal = np.concatenate([
    rng.normal(10, 1, 100),
    rng.normal(14, 1, 100),
    rng.normal(11, 1, 100),
])

def cusum_detect(x, k=0.5, h=4.0, init_window=30):
    """
    k: slack (half the minimum detectable shift, in std units)
    h: decision threshold (in std units)
    After each detection, re-estimates mu/sigma from the next init_window
    observations (the new regime) before resuming accumulation.
    """
    mu    = x[:init_window].mean()
    sigma = max(x[:init_window].std(), 1e-6)

    s_pos, s_neg = np.zeros(len(x)), np.zeros(len(x))
    change_points = []
    s_p = s_n = 0.0           # running accumulators (separate from the stored arrays)
    resume_at = init_window   # don't accumulate during initialisation window

    for i in range(init_window, len(x)):
        if i < resume_at:
            continue                          # inside re-estimation window — skip
        z = (x[i] - mu) / sigma
        s_p = max(0, s_p + z - k)
        s_n = max(0, s_n - z - k)
        s_pos[i], s_neg[i] = s_p, s_n
        if s_p > h or s_n > h:
            change_points.append(i)
            # Re-baseline: estimate new regime from the next init_window points
            end = min(i + init_window, len(x))
            mu    = x[i:end].mean()
            sigma = max(x[i:end].std(), 1e-6)
            s_p = s_n = 0.0
            resume_at = end               # skip past the re-estimation window

    return s_pos, s_neg, change_points

s_pos, s_neg, cps = cusum_detect(signal, k=0.5, h=4.0)

fig, axes = plt.subplots(2, 1, figsize=(11, 6), sharex=True)
axes[0].plot(signal, color="steelblue", linewidth=0.8)
for cp in cps:
    axes[0].axvline(cp, color="tomato", linewidth=1.5, linestyle="--")
axes[0].set_ylabel("Signal value")
axes[0].set_title("Signal with detected change points (dashed red)")

axes[1].plot(s_pos, color="green", label="CUSUM+")
axes[1].plot(-s_neg, color="tomato", label="CUSUM−")
axes[1].axhline(4.0,  color="green", linestyle=":", linewidth=1)
axes[1].axhline(-4.0, color="tomato", linestyle=":", linewidth=1)
axes[1].set_ylabel("CUSUM statistic")
axes[1].set_xlabel("Time")
axes[1].legend()

plt.tight_layout()
plt.show()
print(f"Detected change points: {cps}")
Detected change points: [96, 205, 243]
- `k=0.5` is the slack parameter (half the minimum shift you want to detect, in std units). Too small catches noise; too large misses small shifts.
- `h=4.0` is the decision threshold — the CUSUM accumulates until it reaches this height, then declares a change point and resets.
- After detection, `mu` and `sigma` are re-estimated from the just-completed segment so the detector is calibrated to the new regime — without this re-baselining, every sample in the shifted segment would keep triggering because they are all far from the original reference.

### Detecting variance shifts

Not all changes are shifts in the mean — sometimes the variability itself changes. Applying CUSUM to squared deviations detects increases or decreases in spread.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
# Variance doubles at position 150
signal = np.concatenate([
    rng.normal(10, 1.0, 150),   # low noise
    rng.normal(10, 2.5, 150),   # high noise (same mean, larger std)
])

# Causal trailing rolling mean — no future samples used
window = 20
rolling_mean = pd.Series(signal).rolling(window=window, min_periods=1).mean().values
sq_dev = (signal - rolling_mean) ** 2

def cusum_detect(x, k=0.3, h=5.0, init_window=50):
    mu    = x[:init_window].mean()
    sigma = max(x[:init_window].std(), 1e-9)
    s_pos = np.zeros(len(x))
    change_points = []
    s_p = 0.0
    resume_at = init_window
    for i in range(init_window, len(x)):
        if i < resume_at:
            continue
        z = (x[i] - mu) / sigma
        s_p = max(0, s_p + z - k)
        s_pos[i] = s_p
        if s_p > h:
            change_points.append(i)
            end = min(i + init_window, len(x))
            mu    = x[i:end].mean()
            sigma = max(x[i:end].std(), 1e-9)
            s_p = 0.0
            resume_at = end
    return s_pos, change_points

_, cps = cusum_detect(sq_dev, k=0.3, h=25.0, init_window=60)

fig, axes = plt.subplots(2, 1, figsize=(11, 5), sharex=True)
axes[0].plot(signal, color="steelblue", linewidth=0.6)
for cp in cps:
    axes[0].axvline(cp, color="tomato", linewidth=1.5, linestyle="--")
axes[0].axvline(150, color="black", linewidth=1.5, linestyle="-", label="True change point")
axes[0].set_ylabel("Signal"); axes[0].legend(fontsize=9)
axes[0].set_title("Variance shift detection — signal (noise doubles at t=150)")

axes[1].plot(sq_dev, color="gray", linewidth=0.6)
axes[1].set_ylabel("Squared deviation")
axes[1].set_xlabel("Time")

plt.tight_layout()
plt.show()
print(f"Detected near true change point (150): {cps}")
Detected near true change point (150): [151]
- Applying CUSUM to squared deviations rather than raw values converts a variance-change problem into a mean-change problem.
- `pd.Series.rolling(window, min_periods=1).mean()` is a causal trailing window — each value uses only past and current samples, so the detector cannot signal before the change happens. `np.convolve(..., mode='same')` would mix in future samples and must be avoided here.
- Variance detection is inherently noisier than mean detection — the detected point will typically be offset from the true change by a few positions.

### Conclusion

CUSUM is reliable, parameter-interpretable, and requires no external libraries. The two key parameters — slack k and threshold h — trade off between sensitivity (detecting small shifts quickly) and specificity (avoiding false alarms on noisy data). A good starting point is k = 0.5 (detect shifts of at least 1 std) and h = 4–5; then adjust based on your application's cost of missed detections versus false alarms.

For detecting individual spike anomalies rather than sustained shifts, see [finding peaks with scipy](/tutorials/find-peaks-with-scipy). For characterising the autocorrelation structure of segments between change points, see [autocorrelation and PACF plots](/tutorials/autocorrelation-and-pacf-plots).