Kernel density estimation, or KDE, is a nonparametric way to estimate a probability density from sample data. It provides a smooth alternative to a histogram and can make multimodal structure easier to see. ### Basic KDE
import numpy as np
from scipy import stats
np.random.seed(90)
data = np.concatenate([
np.random.normal(-2, 0.8, 200),
np.random.normal(2, 0.8, 200),
])
kde = stats.gaussian_kde(data)
x_eval = np.linspace(data.min() - 1, data.max() + 1, 200)
print("Density at x = 0:", round(float(kde([0])[0]), 4))
print("Density at x = 2:", round(float(kde([2])[0]), 4))Density at x = 0: 0.064 Density at x = 2: 0.1906
### KDE with a Histogram
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(90)
data = np.concatenate([
np.random.normal(-2, 0.8, 200),
np.random.normal(2, 0.8, 200),
])
kde = stats.gaussian_kde(data)
x_eval = np.linspace(data.min() - 1, data.max() + 1, 200)
plt.figure(figsize=(9, 5))
plt.hist(data, bins=30, density=True, alpha=0.6, label="Histogram")
plt.plot(x_eval, kde(x_eval), color="crimson", linewidth=2, label="KDE")
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Kernel Density Estimate")
plt.legend()
plt.show()### Bandwidth Effects
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(90)
data = np.concatenate([
np.random.normal(-2, 0.8, 200),
np.random.normal(2, 0.8, 200),
])
x_eval = np.linspace(data.min() - 1, data.max() + 1, 200)
kde_default = stats.gaussian_kde(data)
kde_narrow = stats.gaussian_kde(data, bw_method=0.2)
kde_wide = stats.gaussian_kde(data, bw_method=0.5)
plt.figure(figsize=(9, 5))
plt.plot(x_eval, kde_default(x_eval), label="Default bandwidth")
plt.plot(x_eval, kde_narrow(x_eval), label="Narrow bandwidth")
plt.plot(x_eval, kde_wide(x_eval), label="Wide bandwidth")
plt.title("Effect of KDE Bandwidth")
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend()
plt.show()### Practical Example: Exam Score Distribution
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(101)
scores = np.concatenate([
np.random.normal(65, 6, 120),
np.random.normal(82, 4, 80),
])
kde = stats.gaussian_kde(scores)
x_eval = np.linspace(scores.min() - 5, scores.max() + 5, 250)
plt.figure(figsize=(9, 5))
plt.hist(scores, bins=20, density=True, alpha=0.55, label="Scores")
plt.plot(x_eval, kde(x_eval), color="darkgreen", linewidth=2.5, label="KDE")
plt.xlabel("Score")
plt.ylabel("Density")
plt.title("Exam Score Density Estimate")
plt.legend()
plt.show()### Conclusion KDE is a useful complement to histograms when you want a smooth view of the underlying distribution. SciPy's `gaussian_kde` makes it easy to estimate and visualize that density.