Tutorials

Moving Averages with pandas

A daily stock price chart is noisy — each day's value is partly signal and partly random fluctuation. A moving average replaces each point with the mean of the surrounding window, smoothing out the noise and making the trend visible. Shorter windows track short-term changes closely but retain more noise; longer windows show the big picture but lag behind real changes. Pandas makes both simple rolling means and exponentially weighted means (which give more weight to recent values) trivially easy to compute on any Series or DataFrame.

### Computing a simple rolling mean

A rolling mean replaces each value with the average of the previous `n` values. Pandas' `rolling` method handles this with a single line.

import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
t = np.arange(200)
# Underlying trend with noise
values = 50 + 0.2 * t + 8 * np.sin(2 * np.pi * t / 40) + rng.normal(0, 5, 200)
ts = pd.Series(values, name="price")

ma7  = ts.rolling(window=7,  min_periods=1).mean()
ma21 = ts.rolling(window=21, min_periods=1).mean()

print(ts.iloc[:5].round(2).tolist())
print(ma7.iloc[:5].round(2).tolist())
print(ma21.iloc[:5].round(2).tolist())
[51.52, 46.25, 56.62, 58.93, 45.75]
[51.52, 48.89, 51.47, 53.33, 51.82]
[51.52, 48.89, 51.47, 53.33, 51.82]
- `rolling(window=7)` creates a sliding window of 7 points. By default, the window is right-aligned (trailing): position `t` averages positions `t-6` through `t`.
- `min_periods=1` allows the average to be computed even at the start of the series where fewer than `window` points exist — without it, the first 6 values of a 7-point window would be NaN.
- The first value of `ma7` equals `ts.iloc[0]` (only one point); by position 7, all averages use the full 7-point window.

### Comparing window sizes

Shorter and longer windows smooth different amounts of noise. Plotting them together shows the trade-off between responsiveness and smoothness.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
t = np.arange(200)
values = 50 + 0.2 * t + 8 * np.sin(2 * np.pi * t / 40) + rng.normal(0, 5, 200)
ts = pd.Series(values, name="price")

fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(ts, color="lightgray", linewidth=0.8, label="Raw signal")
ax.plot(ts.rolling(5,  min_periods=1).mean(), color="steelblue",  linewidth=1.5, label="MA-5")
ax.plot(ts.rolling(21, min_periods=1).mean(), color="tomato",     linewidth=1.5, label="MA-21")
ax.plot(ts.rolling(50, min_periods=1).mean(), color="darkgreen",  linewidth=1.5, label="MA-50")
ax.set_xlabel("Time")
ax.set_ylabel("Value")
ax.set_title("Moving averages — effect of window size")
ax.legend()
plt.tight_layout()
plt.show()
- MA-5 closely tracks the raw signal and retains most of the cycle; MA-50 is very smooth but lags significantly at turning points.
- The trade-off is fundamental: any linear filter that smooths noise also introduces lag. There is no window size that eliminates noise without lagging the signal.
- The underlying 40-point cycle should be partially visible in MA-5 but mostly suppressed in MA-50, illustrating how window length relates to the signal period you want to see.

### Centred rolling mean and exponential weighting

A centred window (`center=True`) aligns the average at the midpoint of the window, reducing lag at the cost of not being usable for real-time forecasting. Exponential weighting (EWM) avoids hard window edges entirely.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
t = np.arange(200)
values = 50 + 0.2 * t + 8 * np.sin(2 * np.pi * t / 40) + rng.normal(0, 5, 200)
ts = pd.Series(values)

trailing = ts.rolling(21, min_periods=1).mean()
centred  = ts.rolling(21, center=True, min_periods=1).mean()
ewm      = ts.ewm(span=21).mean()  # alpha ≈ 2/(span+1)

fig, ax = plt.subplots(figsize=(11, 5))
ax.plot(ts,        color="lightgray",  linewidth=0.8, label="Raw")
ax.plot(trailing,  color="steelblue",  linewidth=1.5, label="Trailing MA-21")
ax.plot(centred,   color="tomato",     linewidth=1.5, linestyle="--", label="Centred MA-21")
ax.plot(ewm,       color="darkgreen",  linewidth=1.5, label="EWM (span=21)")
ax.set_title("Trailing vs centred MA vs EWM")
ax.set_xlabel("Time")
ax.set_ylabel("Value")
ax.legend()
plt.tight_layout()
plt.show()
- `center=True` shifts the window so position `t` averages from `t - window//2` to `t + window//2`. This eliminates half the lag but requires future values — only usable for retrospective analysis.
- `ewm(span=21)` computes an exponentially weighted mean where the equivalent smoothing weight of the most recent point is `2/(span+1) ≈ 0.09` — older values never fully drop out.
- EWM is more responsive than a trailing moving average of the same span because it never ignores any past value — it just weights them down exponentially.

### Conclusion

Moving averages are the simplest noise-reduction tool for time series: no parameters to estimate, instant to compute, and easy to explain. Use a trailing window for real-time monitoring, a centred window for retrospective trend extraction, and EWM when you want smooth adaptation without a hard window boundary. Always choose the window length based on the shortest cycle you want to preserve — a window longer than the cycle will wash it out.

For smoothing with better frequency selectivity and no lag (at the cost of requiring future values), see [Savitzky-Golay filtering](/tutorials/savitzky-golay-filtering). For detecting individual spikes that moving averages might obscure, see [moving median filters](/tutorials/moving-median-filters-scipy).