Savitzky-Golay Filtering

A simple moving average smooths noise but also reduces peak heights — every sharp feature gets flattened proportionally to the window size. In spectroscopy, chromatography, or any field where peak amplitude is a quantitative measurement, this is unacceptable. The Savitzky-Golay filter solves this by fitting a polynomial (usually degree 2 or 4) to each window and using the polynomial's central value as the output. Because a polynomial can follow local curvature, it preserves peaks much better than a flat average while still smoothing the noise between features. The result is a smoother-looking signal whose peaks retain their original heights and positions.

### Generating a multi-peak noisy signal

Three Gaussian peaks with different heights and widths, buried in noise, represent a typical spectrometry scan.

import numpy as np

rng = np.random.default_rng(42)
x = np.linspace(0, 10, 500)

def gaussian(x, amp, center, sigma):
    return amp * np.exp(-(x - center)**2 / (2 * sigma**2))

# Three peaks
signal_clean = (gaussian(x, 3.0, 2.0, 0.3) +
                gaussian(x, 1.5, 5.5, 0.5) +
                gaussian(x, 2.0, 8.0, 0.4))

noise = rng.normal(0, 0.15, len(x))
signal_noisy = signal_clean + noise

print(f"Peak heights (true): {signal_clean.max():.2f}")
print(f"Noise std: {noise.std():.3f}")
print(f"SNR: {signal_clean.max() / noise.std():.1f}x")

Peak heights (true): 3.00
Noise std: 0.144
SNR: 20.9x

- Three peaks with amplitudes 3.0, 1.5, and 2.0 test whether the filter preserves different heights proportionally.
- A noise standard deviation of 0.15 gives a signal-to-noise ratio of about 20:1 — noisy enough to need filtering, clean enough that the peaks are still visible.
- The peaks have different widths (σ = 0.3, 0.5, 0.4), testing whether narrow peaks are more distorted than wide ones by a given filter.

### Applying the Savitzky-Golay filter

`savgol_filter` requires an odd `window_length` and a `polyorder` smaller than the window length.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter

rng = np.random.default_rng(42)
x = np.linspace(0, 10, 500)
signal_clean = (3.0 * np.exp(-(x-2.0)**2/(2*0.3**2)) +
                1.5 * np.exp(-(x-5.5)**2/(2*0.5**2)) +
                2.0 * np.exp(-(x-8.0)**2/(2*0.4**2)))
signal_noisy = signal_clean + rng.normal(0, 0.15, len(x))

sg_filtered  = savgol_filter(signal_noisy, window_length=21, polyorder=3)
ma_filtered  = np.convolve(signal_noisy, np.ones(21)/21, mode="same")

fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(x, signal_noisy,  color="lightgray",  linewidth=0.7, label="Noisy signal")
ax.plot(x, signal_clean,  color="black",      linewidth=1.0, linestyle="--", label="True signal")
ax.plot(x, ma_filtered,   color="steelblue",  linewidth=1.5, label="Moving average (w=21)")
ax.plot(x, sg_filtered,   color="tomato",     linewidth=1.5, label="Savitzky-Golay (w=21, p=3)")
ax.set_xlabel("x"); ax.set_ylabel("Intensity")
ax.set_title("Savitzky-Golay vs moving average — peak preservation")
ax.legend()
plt.tight_layout()
plt.show()

peak_true = signal_clean.max()
peak_sg   = sg_filtered.max()
peak_ma   = ma_filtered.max()
print(f"True peak:  {peak_true:.3f}")
print(f"SG peak:    {peak_sg:.3f}  (error: {abs(peak_sg - peak_true)/peak_true:.1%})")
print(f"MA peak:    {peak_ma:.3f}  (error: {abs(peak_ma - peak_true)/peak_true:.1%})")

True peak:  3.000
SG peak:    2.927  (error: 2.4%)
MA peak:    2.718  (error: 9.4%)

- `window_length=21` spans about 4% of the signal — wide enough to smooth the noise but narrow enough not to distort the peaks significantly.
- `polyorder=3` (cubic) is the standard choice. Higher order follows more curvature but overfits the noise; lower order is smoother but flattens peaks more.
- The peak height comparison shows that SG preserves the peak amplitude much more accurately than the moving average, which systematically underestimates it.

### Effect of polynomial order

Higher polynomial orders better preserve sharp features at the cost of slightly less noise suppression. Lower orders are smoother but lose more peak detail.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter

rng = np.random.default_rng(42)
x = np.linspace(0, 10, 500)
signal_clean = (3.0 * np.exp(-(x-2.0)**2/(2*0.3**2)) +
                1.5 * np.exp(-(x-5.5)**2/(2*0.5**2)) +
                2.0 * np.exp(-(x-8.0)**2/(2*0.4**2)))
signal_noisy = signal_clean + rng.normal(0, 0.15, len(x))

fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(x, signal_noisy, color="lightgray", linewidth=0.7, label="Noisy")
ax.plot(x, signal_clean, color="black",     linewidth=1.0, linestyle="--", label="True")
for order, color in [(1, "steelblue"), (2, "orange"), (4, "tomato")]:
    sg = savgol_filter(signal_noisy, window_length=31, polyorder=order)
    rmse = np.sqrt(np.mean((sg - signal_clean)**2))
    ax.plot(x, sg, color=color, linewidth=1.5, label=f"polyorder={order} (RMSE={rmse:.4f})")

ax.set_xlabel("x"); ax.set_ylabel("Intensity")
ax.set_title("Savitzky-Golay — effect of polynomial order (window=31)")
ax.legend()
plt.tight_layout()
plt.show()

- Polyorder 1 (linear) is essentially a weighted moving average — it suppresses noise but significantly flattens peaks.
- Polyorder 4 follows the curvature at the peak tops closely, but may also follow noise fluctuations more aggressively between peaks.
- RMSE relative to the clean signal is the objective measure: typically polyorder 2 or 3 gives the best overall trade-off.

### Conclusion

Savitzky-Golay is the preferred smoothing filter whenever peak height, position, or width are quantitative measurements — spectrometry, chromatography, EEG, and any sensor measurement where feature shape matters. Use `window_length` to control how much noise is removed and `polyorder` to control how closely the filter follows sharp features. A good default is `window_length = 5–10% of the signal length, polyorder = 3`.

For impulse noise (spike outlier) removal rather than Gaussian noise smoothing, see [moving median filters](/tutorials/moving-median-filters-scipy). For the underlying signal smoothing via moving average, see [moving averages with pandas](/tutorials/moving-averages-pandas).