The Kolmogorov-Smirnov test compares distributions by measuring the largest difference between cumulative distribution functions. It can be used to compare a sample to a reference distribution or to compare two samples directly. ### Two-Sample KS Test
import numpy as np
from scipy import stats
np.random.seed(52)
sample_a = np.random.normal(loc=0, scale=1, size=120)
sample_b = np.random.normal(loc=0.6, scale=1.1, size=120)
result = stats.ks_2samp(sample_a, sample_b)
print(f"KS statistic: {result.statistic:.3f}")
print(f"P-value: {result.pvalue:.6f}")KS statistic: 0.200 P-value: 0.016260
### Interpreting the Result
import numpy as np
from scipy import stats
np.random.seed(52)
sample_a = np.random.normal(loc=0, scale=1, size=120)
sample_b = np.random.normal(loc=0.6, scale=1.1, size=120)
result = stats.ks_2samp(sample_a, sample_b)
if result.pvalue < 0.05:
print("Reject the null hypothesis: the samples come from different distributions.")
else:
print("Fail to reject the null hypothesis: the samples could come from the same distribution.")Reject the null hypothesis: the samples come from different distributions.
### Visualizing the Samples
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(52)
sample_a = np.random.normal(loc=0, scale=1, size=120)
sample_b = np.random.normal(loc=0.6, scale=1.1, size=120)
plt.figure(figsize=(9, 5))
plt.hist(sample_a, bins=18, alpha=0.6, density=True, label="Sample A")
plt.hist(sample_b, bins=18, alpha=0.6, density=True, label="Sample B")
plt.title("Two Samples Compared with KS Test")
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend()
plt.show()### One-Sample KS Test Against a Normal Distribution
import numpy as np
from scipy import stats
np.random.seed(70)
sample = np.random.normal(loc=0, scale=1, size=100)
result = stats.kstest(sample, "norm")
print(f"One-sample KS statistic: {result.statistic:.3f}")
print(f"P-value: {result.pvalue:.6f}")One-sample KS statistic: 0.084 P-value: 0.453159
### Practical Example: Comparing Load Time Distributions
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(83)
before = np.random.lognormal(mean=1.9, sigma=0.25, size=100)
after = np.random.lognormal(mean=1.7, sigma=0.20, size=100)
result = stats.ks_2samp(before, after)
print(f"KS statistic: {result.statistic:.3f}")
print(f"P-value: {result.pvalue:.6f}")
print("Conclusion: load-time distributions differ significantly." if result.pvalue < 0.05 else "Conclusion: no significant distribution difference detected.")
plt.figure(figsize=(9, 5))
plt.hist(before, bins=14, alpha=0.6, density=True, label="Before")
plt.hist(after, bins=14, alpha=0.6, density=True, label="After")
plt.xlabel("Load time")
plt.ylabel("Density")
plt.title("Before vs After Load Time Distributions")
plt.legend()
plt.show()KS statistic: 0.380 P-value: 0.000001 Conclusion: load-time distributions differ significantly.
### Conclusion The KS test is useful when you want to compare entire distributions rather than only their means. It works especially well alongside a plot of the sample distributions.