Mann-Whitney U Test with SciPy

The Mann-Whitney U test is a nonparametric alternative to the [independent samples t-test](/tutorials/independent-samples-t-test-with-scipy). Instead of comparing means, it asks: if you picked one observation from each group at random, which group's value would more often be larger? It works by ranking all observations from both groups combined, then checking whether the ranks are evenly distributed between the two groups. This makes it robust to outliers, skewed distributions, and data that fails the [normality assumption](/tutorials/normality-tests-with-scipy) — making it a good default for response times, income, biological measurements, and other data that commonly skew right.

### Basic Mann-Whitney U Test

Exponential data is skewed, making it a realistic case where the t-test's normality assumption is questionable and the Mann-Whitney test is the safer choice.

import numpy as np
from scipy import stats

np.random.seed(44)

group_a = np.random.exponential(scale=1.0, size=40)
group_b = np.random.exponential(scale=1.9, size=40)

result = stats.mannwhitneyu(group_a, group_b, alternative="two-sided")

print(f"Median of group A: {np.median(group_a):.3f}")
print(f"Median of group B: {np.median(group_b):.3f}")
print(f"U statistic: {result.statistic:.3f}")
print(f"P-value: {result.pvalue:.6f}")

Median of group A: 0.612
Median of group B: 1.654
U statistic: 521.000
P-value: 0.007365

- `alternative="two-sided"` tests whether either group tends to have larger values — use `"less"` or `"greater"` for one-sided tests when you have a directional hypothesis.
- The U statistic counts the number of times a value from group A exceeds a value from group B across all possible pairs — it ranges from 0 to `n_a * n_b`.
- Printing medians instead of means is intentional: the Mann-Whitney test is about the rank distribution, and the median is a more appropriate summary for skewed data.

### Interpreting the Result

A significant result means the two groups have different distributions or central tendencies — but because the test is rank-based, it is technically testing the full distribution, not just the median.

import numpy as np
from scipy import stats

np.random.seed(44)

group_a = np.random.exponential(scale=1.0, size=40)
group_b = np.random.exponential(scale=1.9, size=40)

result = stats.mannwhitneyu(group_a, group_b, alternative="two-sided")

if result.pvalue < 0.05:
    print("Reject the null hypothesis: the groups differ in distribution or central tendency.")
else:
    print("Fail to reject the null hypothesis: no strong difference was detected.")

Reject the null hypothesis: the groups differ in distribution or central tendency.

- The null hypothesis is that a randomly selected value from group A is equally likely to be greater than or less than a randomly selected value from group B.
- Unlike the t-test, the Mann-Whitney test is not sensitive to extreme outliers because it only uses the ranks of observations, not their actual values.

### Visualizing the Two Groups

Box plots are particularly useful here because the test is about the central tendency and spread of ranks, not just the means.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(44)

group_a = np.random.exponential(scale=1.0, size=40)
group_b = np.random.exponential(scale=1.9, size=40)

plt.figure(figsize=(9, 5))
plt.boxplot([group_a, group_b], tick_labels=["Group A", "Group B"])
plt.ylabel("Value")
plt.title("Independent Groups for Mann-Whitney U Test")
plt.grid(axis="y", linestyle="--", alpha=0.4)
plt.show()

- The median line (center of each box) is more meaningful than the mean for skewed data like this — it corresponds more closely to what the Mann-Whitney test is measuring.
- The long upper whiskers on both boxes confirm the right skew that motivated choosing this test over the t-test.

### Practical Example: Response Times for Two Interfaces

Response time data is a classic Mann-Whitney use case — it's typically right-skewed, with a few very slow responses pulling the mean up, making the median and rank-based tests more representative than mean-based ones.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(61)

interface_a = np.random.gamma(shape=2.5, scale=180, size=50)
interface_b = np.random.gamma(shape=2.5, scale=300, size=50)

result = stats.mannwhitneyu(interface_a, interface_b, alternative="two-sided")

print(f"Median response time A: {np.median(interface_a):.1f} ms")
print(f"Median response time B: {np.median(interface_b):.1f} ms")
print(f"P-value: {result.pvalue:.6f}")
print("Conclusion: the interfaces differ significantly." if result.pvalue < 0.05 else "Conclusion: no significant difference detected.")

plt.figure(figsize=(9, 5))
plt.hist(interface_a, bins=12, alpha=0.6, label="Interface A")
plt.hist(interface_b, bins=12, alpha=0.6, label="Interface B")
plt.xlabel("Response time (ms)")
plt.ylabel("Count")
plt.title("Response Time Distributions")
plt.legend()
plt.show()

Median response time A: 336.4 ms
Median response time B: 562.8 ms
P-value: 0.000307
Conclusion: the interfaces differ significantly.

- Gamma-distributed response times are a realistic simulation — real response time data often follows a similar skewed shape.
- Overlapping histograms let you see the full shape of each distribution, which gives context for why the rank-based test detects the difference even without assuming normality.
- A significant result here would inform a decision to investigate why one interface is slower, or to A/B test further.

### Conclusion

The Mann-Whitney U test is the right choice when your data are skewed, have outliers, or clearly fail [normality tests](/tutorials/normality-tests-with-scipy). It loses some statistical power compared to the t-test when normality actually holds, but it's a safe default for continuous data in many real-world domains.

For comparing more than two groups without normality, look into the Kruskal-Wallis test. To compare two full distributions (not just their ranks), see the [Kolmogorov-Smirnov test](/tutorials/kolmogorov-smirnov-test-with-scipy).