Cronbach's Alpha

When you design a questionnaire with multiple items that are all supposed to measure the same thing — job satisfaction, anxiety, brand loyalty — you need to check that the items actually hang together. Cronbach's alpha quantifies this *internal consistency*: it is essentially the average of all possible split-half correlations across the items. An alpha of 0.70–0.90 is typically considered acceptable for research; below 0.60 suggests the items are measuring different things; above 0.95 often signals that items are so similar they add no new information. Python has no dedicated function for alpha in the standard scientific stack, but the formula is a single expression using NumPy.

### Simulating a job satisfaction scale

We'll create a 6-item scale where five items genuinely measure job satisfaction and one is a rogue item unrelated to the construct — simulating the real situation where a poorly written questionnaire item slips through.

import numpy as np

rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)  # latent construct

items = np.column_stack([
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),  # JS1
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),  # JS2
    np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),  # JS3
    np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),  # JS4
    np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),  # JS5
    np.clip(4 +                      rng.normal(0, 1.5, n), 1, 7),  # JS6 — bad item
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]

print(f"{'Item':>5}  {'Mean':>6}  {'SD':>6}")
for name, col in zip(labels, items.T):
    print(f"{name:>5}  {col.mean():>6.2f}  {col.std(ddof=1):>6.2f}")

 Item    Mean      SD
  JS1    3.96    1.33
  JS2    3.91    1.37
  JS3    3.94    1.23
  JS4    3.93    1.33
  JS5    3.97    1.23
  JS6    3.95    1.52

- Items JS1–JS5 are generated as `4 + loading * satisfaction + noise`, so they all move together with the latent construct. A higher satisfaction score shifts all five items upward simultaneously.
- JS6 has no `satisfaction` term — it is pure noise with a wider spread, simulating a question like "How far is your commute?" which happens to appear on a job satisfaction survey.
- `np.clip(..., 1, 7)` keeps all values on the intended 1–7 Likert scale, as would happen with real survey responses.

### Computing Cronbach's alpha

The formula relates the sum of individual item variances to the variance of the total score: if items are highly correlated, summing them inflates the total variance far more than the sum of individual variances.

import numpy as np

rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
    np.clip(4 +                      rng.normal(0, 1.5, n), 1, 7),
])

def cronbach_alpha(X):
    X = np.asarray(X)
    k = X.shape[1]
    item_vars = X.var(axis=0, ddof=1)
    total_var = X.sum(axis=1).var(ddof=1)
    return (k / (k - 1)) * (1 - item_vars.sum() / total_var)

alpha = cronbach_alpha(items)
print(f"Cronbach's alpha (all 6 items): {alpha:.3f}")

benchmarks = [(0.90, "Excellent"), (0.80, "Good"), (0.70, "Acceptable"),
              (0.60, "Questionable"), (0.50, "Poor")]
label = "Unacceptable"
for threshold, name in benchmarks:
    if alpha >= threshold:
        label = name
        break
print(f"Interpretation: {label}")

Cronbach's alpha (all 6 items): 0.870
Interpretation: Good

- `item_vars = X.var(axis=0, ddof=1)` computes the sample variance for each column (each item). The sum of these measures how much unique variance each item contributes.
- `X.sum(axis=1).var(ddof=1)` computes the variance of the *total score* — the row sum. When items are correlated, this is much larger than the sum of individual variances.
- The `(k / (k-1))` prefactor is a bias correction. Without it, alpha would be systematically underestimated for short scales.

### Corrected item-total correlations

A corrected item-total correlation (CITC) measures how well each item correlates with the sum of all *other* items. Items with low CITC are poor indicators of the construct.

import numpy as np
from scipy.stats import pearsonr

rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
    np.clip(4 +                      rng.normal(0, 1.5, n), 1, 7),
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]

print(f"{'Item':>5}  {'CITC':>8}  {'Note'}")
print("-" * 35)
for i, name in enumerate(labels):
    rest_sum = np.delete(items, i, axis=1).sum(axis=1)
    r, _ = pearsonr(items[:, i], rest_sum)
    flag = "  ← weak" if r < 0.30 else ""
    print(f"{name:>5}  {r:>8.3f}{flag}")

 Item      CITC  Note
-----------------------------------
  JS1     0.847
  JS2     0.877
  JS3     0.860
  JS4     0.859
  JS5     0.816
  JS6     0.010  ← weak

- `np.delete(items, i, axis=1)` removes column `i`, leaving the other items. Summing those gives the *rest score* for that item.
- `pearsonr(item_i, rest_score)` is the CITC. The "corrected" part means the item itself is excluded from the total — without this correction the correlation would be inflated.
- A threshold of 0.30 is commonly used: items below it are candidates for removal. JS6 should score far below this because it shares no variance with the construct.

### Alpha-if-item-deleted

Removing a poor item should raise alpha. Computing alpha for each possible subset of k−1 items identifies exactly which items are hurting and which are helping reliability.

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
    np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
    np.clip(4 +                      rng.normal(0, 1.5, n), 1, 7),
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]

def cronbach_alpha(X):
    X = np.asarray(X)
    k = X.shape[1]
    return (k / (k - 1)) * (1 - X.var(axis=0, ddof=1).sum() / X.sum(axis=1).var(ddof=1))

overall = cronbach_alpha(items)
alphas_if_deleted = [cronbach_alpha(np.delete(items, i, axis=1)) for i in range(6)]

colors = ["tomato" if a > overall else "steelblue" for a in alphas_if_deleted]

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.barh(labels, alphas_if_deleted, color=colors)
ax.axvline(overall, color="black", linestyle="--", linewidth=1.5, label=f"Overall α = {overall:.3f}")
ax.set_xlabel("Cronbach's alpha if item deleted")
ax.set_title("Alpha-if-item-deleted analysis")
ax.legend()
ax.set_xlim(0.5, 1.0)
for bar, val in zip(bars, alphas_if_deleted):
    ax.text(val + 0.005, bar.get_y() + bar.get_height() / 2,
            f"{val:.3f}", va="center", fontsize=9)
plt.tight_layout()
plt.show()

- `[cronbach_alpha(np.delete(items, i, axis=1)) for i in range(6)]` computes alpha six times, each time leaving one item out.
- Bars in tomato red indicate items whose deletion *increases* alpha — these are the problem items. Blue bars indicate items that contribute positively to reliability.
- JS6's bar should be the only red one, with an alpha-if-deleted clearly higher than the dashed overall line.

### Conclusion

Cronbach's alpha is a two-line computation once you have the formula, but its real value lies in the diagnostics — item-total correlations and alpha-if-deleted — that tell you which items to keep, revise, or drop. Aim for α ≥ 0.70 in research contexts, and always re-check alpha after removing any item.

For the factor structure underlying a multi-item scale, see [factor analysis](/tutorials/factor-analysis). For testing whether two groups score differently on the final scale, see [independent samples t-test with SciPy](/tutorials/independent-samples-t-test-with-scipy).