Independent Samples t-Test with SciPy

When you have two groups that were measured separately — a control group and a treatment group, or users from two different regions — and you want to know whether their average outcomes differ, the independent samples t-test is the standard starting point. It produces a t-statistic (the standardized difference between the means) and a p-value (the probability of seeing a difference this large if the two groups were actually identical). The standard version assumes the two groups have equal variances; Welch's t-test relaxes that assumption and is generally the safer default when you're unsure.

### Basic Independent Samples t-Test

The standard t-test assumes both groups are roughly normally distributed and have similar spread. Here we simulate two groups with different means to see what a significant result looks like.

import numpy as np
from scipy import stats

np.random.seed(42)

group_a = np.random.normal(loc=72, scale=6, size=40)
group_b = np.random.normal(loc=78, scale=6, size=40)

t_statistic, p_value = stats.ttest_ind(group_a, group_b)

print(f"Mean of group A: {group_a.mean():.2f}")
print(f"Mean of group B: {group_b.mean():.2f}")
print(f"t-statistic: {t_statistic:.3f}")
print(f"p-value: {p_value:.6f}")

Mean of group A: 70.69
Mean of group B: 77.83
t-statistic: -5.549
p-value: 0.000000

- `stats.ttest_ind(group_a, group_b)` runs the equal-variance t-test and returns the t-statistic and a two-sided p-value.
- A negative t-statistic means group A's mean is lower than group B's — the sign indicates direction, not significance.
- The p-value close to zero tells us the observed gap is very unlikely under the null hypothesis of equal means.

### Interpreting the Result

A p-value alone doesn't tell you whether the difference matters practically — it only tells you whether it's likely to be real. Compare the p-value to your significance threshold (commonly 0.05) to make a decision.

import numpy as np
from scipy import stats

np.random.seed(42)

group_a = np.random.normal(loc=72, scale=6, size=40)
group_b = np.random.normal(loc=78, scale=6, size=40)

t_statistic, p_value = stats.ttest_ind(group_a, group_b)
alpha = 0.05

print(f"t-statistic: {t_statistic:.3f}")
print(f"p-value: {p_value:.6f}")

if p_value < alpha:
    print("Reject the null hypothesis: the group means differ significantly.")
else:
    print("Fail to reject the null hypothesis: the data do not show a significant difference.")

t-statistic: -5.549
p-value: 0.000000
Reject the null hypothesis: the group means differ significantly.

- Setting `alpha = 0.05` means you accept a 5% chance of incorrectly rejecting the null when it's actually true (a Type I error).
- Rejecting the null doesn't prove the groups are different — it means the data are inconsistent with the assumption that they're the same.

### Visualizing the Two Groups

A box plot shows the center and spread of each group, making it easier to understand what the t-test is comparing and whether the spread looks similar between groups.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

group_a = np.random.normal(loc=72, scale=6, size=40)
group_b = np.random.normal(loc=78, scale=6, size=40)

plt.figure(figsize=(9, 5))
plt.boxplot([group_a, group_b], tick_labels=["Group A", "Group B"])
plt.ylabel("Score")
plt.title("Comparison of Two Independent Groups")
plt.grid(axis="y", linestyle="--", alpha=0.4)
plt.show()

- The box spans the interquartile range (25th to 75th percentile); the line inside it is the median.
- Overlapping boxes don't mean the difference is insignificant — they show spread, not statistical certainty.

### Using Welch's t-Test

When the two groups have noticeably different spreads, the equal-variance assumption is violated and the standard t-test can give inaccurate p-values. Welch's t-test adjusts the degrees of freedom to compensate, making it more reliable in those situations.

import numpy as np
from scipy import stats

np.random.seed(7)

group_a = np.random.normal(loc=72, scale=4, size=35)
group_b = np.random.normal(loc=77, scale=9, size=35)

equal_var_test = stats.ttest_ind(group_a, group_b, equal_var=True)
welch_test = stats.ttest_ind(group_a, group_b, equal_var=False)

print("Standard t-test:")
print(f"  t-statistic: {equal_var_test.statistic:.3f}")
print(f"  p-value: {equal_var_test.pvalue:.6f}")

print("Welch's t-test:")
print(f"  t-statistic: {welch_test.statistic:.3f}")
print(f"  p-value: {welch_test.pvalue:.6f}")

Standard t-test:
  t-statistic: -2.673
  p-value: 0.009406
Welch's t-test:
  t-statistic: -2.673
  p-value: 0.010341

- `equal_var=False` activates Welch's correction — this is the default in many other statistical packages, making it a safe habit.
- With unequal variances (`scale=4` vs `scale=9`), the two tests can produce meaningfully different p-values.
- When variances are roughly equal, Welch's and the standard test give nearly identical results, so there's little cost to always using Welch's.

### Practical Example: Comparing Two Teaching Methods

Here, two classes received different teaching methods and we test whether the difference in average exam scores is statistically significant.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(123)

class_a = np.random.normal(loc=74, scale=5, size=30)
class_b = np.random.normal(loc=80, scale=5, size=30)

result = stats.ttest_ind(class_a, class_b, equal_var=False)

print(f"Average score in class A: {class_a.mean():.2f}")
print(f"Average score in class B: {class_b.mean():.2f}")
print(f"Welch's t-statistic: {result.statistic:.3f}")
print(f"Welch's p-value: {result.pvalue:.6f}")

if result.pvalue < 0.05:
    print("Conclusion: the difference in average scores is statistically significant.")
else:
    print("Conclusion: the score difference is not statistically significant.")

plt.figure(figsize=(9, 5))
plt.hist(class_a, bins=8, alpha=0.6, label="Class A")
plt.hist(class_b, bins=8, alpha=0.6, label="Class B")
plt.xlabel("Exam score")
plt.ylabel("Count")
plt.title("Score Distributions for Two Teaching Methods")
plt.legend()
plt.show()

Average score in class A: 74.22
Average score in class B: 80.71
Welch's t-statistic: -4.150
Welch's p-value: 0.000110
Conclusion: the difference in average scores is statistically significant.

- Using `equal_var=False` (Welch's) is the safer default for real data where you can't verify equal variances in advance.
- The overlapping histograms give visual context for why the test is needed — the distributions are close enough that eyeballing alone is unreliable.
- The p-value quantifies the evidence; the chart shows what that evidence looks like in the raw data.

### Conclusion

Welch's t-test (`equal_var=False`) is the recommended default for comparing two independent groups — it handles unequal variances gracefully and behaves like the standard test when variances are similar. Always pair the result with a chart to check whether the data actually look like two distinct groups.

To compare three or more groups, see [one-way ANOVA with SciPy](/tutorials/one-way-anova-with-scipy). If the normality assumption is questionable, the [Mann-Whitney U test](/tutorials/mann-whitney-u-test-with-scipy) is a robust nonparametric alternative.