A single corrupted sensor reading — a spike caused by electrical interference — will drag a moving average off course for the entire window around it. The moving median is immune to this: it takes the *middle value* of the window, so a single outlier simply becomes one of the sorted values and has no effect on the result. This makes median filters indispensable for signal data contaminated with impulse noise (single-sample spikes), salt-and-pepper noise in images, or any dataset where isolated measurement errors should not contaminate neighbouring estimates. Scipy provides a ready-made 1D implementation in `scipy.signal.medfilt`. ### Generating a signal with impulse noise A smooth sinusoidal signal with a handful of random spike outliers — exactly the scenario where median filtering outperforms mean filtering.
import numpy as np
rng = np.random.default_rng(42)
n = 200
t = np.linspace(0, 4 * np.pi, n)
# Clean signal
signal_clean = 5 * np.sin(t)
# Add Gaussian background noise
signal_noisy = signal_clean + rng.normal(0, 0.3, n)
# Inject 10 random impulse spikes
spike_idx = rng.choice(n, size=10, replace=False)
signal_noisy[spike_idx] += rng.choice([-1, 1], size=10) * rng.uniform(8, 15, 10)
print(f"Clean range: [{signal_clean.min():.1f}, {signal_clean.max():.1f}]")
print(f"Noisy range: [{signal_noisy.min():.1f}, {signal_noisy.max():.1f}]")
print(f"Spike positions: {sorted(spike_idx)[:5]} ...")- The clean signal has range [−5, 5]; spikes of ±8–15 are far outside this range, making them easily visible and easily detected. - `rng.choice([-1, 1], ...)` randomises spike direction — positive and negative spikes are equally common. - Background Gaussian noise (std=0.3) is much smaller than the spikes — this represents measurement electronics adding constant low-level noise plus occasional large errors. ### Applying the median filter `scipy.signal.medfilt` slides an odd-sized window along the signal and replaces each point with the window's median.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import medfilt
rng = np.random.default_rng(42)
n = 200
t = np.linspace(0, 4 * np.pi, n)
signal_clean = 5 * np.sin(t)
signal_noisy = signal_clean + rng.normal(0, 0.3, n)
spike_idx = rng.choice(n, size=10, replace=False)
signal_noisy[spike_idx] += rng.choice([-1, 1], size=10) * rng.uniform(8, 15, 10)
filtered_med = medfilt(signal_noisy, kernel_size=7)
filtered_mean = np.convolve(signal_noisy, np.ones(7) / 7, mode="same")
fig, axes = plt.subplots(3, 1, figsize=(11, 8), sharex=True)
axes[0].plot(signal_noisy, color="gray", linewidth=0.7, label="Noisy (with spikes)")
axes[0].plot(signal_clean, color="black", linewidth=1.0, linestyle="--", label="True signal")
axes[0].legend(fontsize=9); axes[0].set_ylabel("Amplitude")
axes[0].set_title("Original signal")
axes[1].plot(filtered_mean, color="steelblue", linewidth=1.5, label="Moving mean (k=7)")
axes[1].plot(signal_clean, color="black", linewidth=1.0, linestyle="--")
axes[1].legend(fontsize=9); axes[1].set_ylabel("Amplitude")
axes[1].set_title("Moving mean — spikes smeared into neighbours")
axes[2].plot(filtered_med, color="tomato", linewidth=1.5, label="Median filter (k=7)")
axes[2].plot(signal_clean, color="black", linewidth=1.0, linestyle="--")
axes[2].legend(fontsize=9); axes[2].set_ylabel("Amplitude")
axes[2].set_xlabel("Sample")
axes[2].set_title("Median filter — spikes removed cleanly")
plt.tight_layout()
plt.show()- `kernel_size=7` means each output value is the median of 7 consecutive input values. The kernel must be odd so there is a true middle value. - The moving mean smears each spike across its entire 7-point window, creating a bump that persists for several samples. The median filter simply discards the spike without affecting neighbours. - Both filters smooth background Gaussian noise similarly — the key difference appears only at spike locations. ### Choosing the kernel size Larger kernels remove wider spikes but increasingly blur rapid signal transitions. Comparing several kernel sizes shows this trade-off.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import medfilt
rng = np.random.default_rng(42)
n = 200
t = np.linspace(0, 4 * np.pi, n)
signal_clean = 5 * np.sin(t)
signal_noisy = signal_clean + rng.normal(0, 0.3, n)
spike_idx = rng.choice(n, size=10, replace=False)
signal_noisy[spike_idx] += rng.choice([-1, 1], size=10) * rng.uniform(8, 15, 10)
kernels = [3, 7, 15]
colors = ["steelblue", "tomato", "green"]
fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(signal_noisy, color="lightgray", linewidth=0.7, label="Noisy signal")
ax.plot(signal_clean, color="black", linewidth=1.0, linestyle="--", label="True signal")
for k, color in zip(kernels, colors):
filtered = medfilt(signal_noisy, kernel_size=k)
rmse = np.sqrt(np.mean((filtered - signal_clean)**2))
ax.plot(filtered, color=color, linewidth=1.5, label=f"k={k} (RMSE={rmse:.3f})")
ax.set_title("Median filter — effect of kernel size")
ax.set_xlabel("Sample")
ax.set_ylabel("Amplitude")
ax.legend()
plt.tight_layout()
plt.show()- A kernel of 3 removes isolated single-point spikes but leaves wider noise structures. A kernel of 15 removes wider spikes but starts to flatten the signal peaks. - RMSE relative to the clean signal is printed in the legend — the optimal kernel minimises this error for the specific noise characteristics. - As a rule of thumb, the kernel should be just large enough to span the widest spike you expect, and no larger. ### Conclusion The median filter is the right choice when your data contains impulse spikes you want to remove without affecting the surrounding signal shape. It preserves edges and step transitions that a moving average would smear. For signals with only Gaussian noise (no spikes), a moving average achieves similar or slightly better smoothing because it uses all values in the window, not just the median. For smoothing with better preservation of signal shape than both moving mean and median, see [Savitzky-Golay filtering](/tutorials/savitzky-golay-filtering). For detecting rather than removing spikes, see [finding peaks with scipy](/tutorials/find-peaks-with-scipy).