When you design a questionnaire with multiple items that are all supposed to measure the same thing — job satisfaction, anxiety, brand loyalty — you need to check that the items actually hang together. Cronbach's alpha quantifies this *internal consistency*: it is essentially the average of all possible split-half correlations across the items. An alpha of 0.70–0.90 is typically considered acceptable for research; below 0.60 suggests the items are measuring different things; above 0.95 often signals that items are so similar they add no new information. Python has no dedicated function for alpha in the standard scientific stack, but the formula is a single expression using NumPy. ### Simulating a job satisfaction scale We'll create a 6-item scale where five items genuinely measure job satisfaction and one is a rogue item unrelated to the construct — simulating the real situation where a poorly written questionnaire item slips through.
import numpy as np
rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n) # latent construct
items = np.column_stack([
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7), # JS1
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7), # JS2
np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7), # JS3
np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7), # JS4
np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7), # JS5
np.clip(4 + rng.normal(0, 1.5, n), 1, 7), # JS6 — bad item
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]
print(f"{'Item':>5} {'Mean':>6} {'SD':>6}")
for name, col in zip(labels, items.T):
print(f"{name:>5} {col.mean():>6.2f} {col.std(ddof=1):>6.2f}")- Items JS1–JS5 are generated as `4 + loading * satisfaction + noise`, so they all move together with the latent construct. A higher satisfaction score shifts all five items upward simultaneously. - JS6 has no `satisfaction` term — it is pure noise with a wider spread, simulating a question like "How far is your commute?" which happens to appear on a job satisfaction survey. - `np.clip(..., 1, 7)` keeps all values on the intended 1–7 Likert scale, as would happen with real survey responses. ### Computing Cronbach's alpha The formula relates the sum of individual item variances to the variance of the total score: if items are highly correlated, summing them inflates the total variance far more than the sum of individual variances.
import numpy as np
rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
np.clip(4 + rng.normal(0, 1.5, n), 1, 7),
])
def cronbach_alpha(X):
X = np.asarray(X)
k = X.shape[1]
item_vars = X.var(axis=0, ddof=1)
total_var = X.sum(axis=1).var(ddof=1)
return (k / (k - 1)) * (1 - item_vars.sum() / total_var)
alpha = cronbach_alpha(items)
print(f"Cronbach's alpha (all 6 items): {alpha:.3f}")
benchmarks = [(0.90, "Excellent"), (0.80, "Good"), (0.70, "Acceptable"),
(0.60, "Questionable"), (0.50, "Poor")]
label = "Unacceptable"
for threshold, name in benchmarks:
if alpha >= threshold:
label = name
break
print(f"Interpretation: {label}")- `item_vars = X.var(axis=0, ddof=1)` computes the sample variance for each column (each item). The sum of these measures how much unique variance each item contributes. - `X.sum(axis=1).var(ddof=1)` computes the variance of the *total score* — the row sum. When items are correlated, this is much larger than the sum of individual variances. - The `(k / (k-1))` prefactor is a bias correction. Without it, alpha would be systematically underestimated for short scales. ### Corrected item-total correlations A corrected item-total correlation (CITC) measures how well each item correlates with the sum of all *other* items. Items with low CITC are poor indicators of the construct.
import numpy as np
from scipy.stats import pearsonr
rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
np.clip(4 + rng.normal(0, 1.5, n), 1, 7),
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]
print(f"{'Item':>5} {'CITC':>8} {'Note'}")
print("-" * 35)
for i, name in enumerate(labels):
rest_sum = np.delete(items, i, axis=1).sum(axis=1)
r, _ = pearsonr(items[:, i], rest_sum)
flag = " ← weak" if r < 0.30 else ""
print(f"{name:>5} {r:>8.3f}{flag}")- `np.delete(items, i, axis=1)` removes column `i`, leaving the other items. Summing those gives the *rest score* for that item. - `pearsonr(item_i, rest_score)` is the CITC. The "corrected" part means the item itself is excluded from the total — without this correction the correlation would be inflated. - A threshold of 0.30 is commonly used: items below it are candidates for removal. JS6 should score far below this because it shares no variance with the construct. ### Alpha-if-item-deleted Removing a poor item should raise alpha. Computing alpha for each possible subset of k−1 items identifies exactly which items are hurting and which are helping reliability.
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(42)
n = 200
satisfaction = rng.normal(0, 1, n)
items = np.column_stack([
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.5 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.3 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.4 * satisfaction + rng.normal(0, 0.5, n), 1, 7),
np.clip(4 + 1.2 * satisfaction + rng.normal(0, 0.6, n), 1, 7),
np.clip(4 + rng.normal(0, 1.5, n), 1, 7),
])
labels = ["JS1", "JS2", "JS3", "JS4", "JS5", "JS6"]
def cronbach_alpha(X):
X = np.asarray(X)
k = X.shape[1]
return (k / (k - 1)) * (1 - X.var(axis=0, ddof=1).sum() / X.sum(axis=1).var(ddof=1))
overall = cronbach_alpha(items)
alphas_if_deleted = [cronbach_alpha(np.delete(items, i, axis=1)) for i in range(6)]
colors = ["tomato" if a > overall else "steelblue" for a in alphas_if_deleted]
fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.barh(labels, alphas_if_deleted, color=colors)
ax.axvline(overall, color="black", linestyle="--", linewidth=1.5, label=f"Overall α = {overall:.3f}")
ax.set_xlabel("Cronbach's alpha if item deleted")
ax.set_title("Alpha-if-item-deleted analysis")
ax.legend()
ax.set_xlim(0.5, 1.0)
for bar, val in zip(bars, alphas_if_deleted):
ax.text(val + 0.005, bar.get_y() + bar.get_height() / 2,
f"{val:.3f}", va="center", fontsize=9)
plt.tight_layout()
plt.show()- `[cronbach_alpha(np.delete(items, i, axis=1)) for i in range(6)]` computes alpha six times, each time leaving one item out. - Bars in tomato red indicate items whose deletion *increases* alpha — these are the problem items. Blue bars indicate items that contribute positively to reliability. - JS6's bar should be the only red one, with an alpha-if-deleted clearly higher than the dashed overall line. ### Conclusion Cronbach's alpha is a two-line computation once you have the formula, but its real value lies in the diagnostics — item-total correlations and alpha-if-deleted — that tell you which items to keep, revise, or drop. Aim for α ≥ 0.70 in research contexts, and always re-check alpha after removing any item. For the factor structure underlying a multi-item scale, see [factor analysis](/tutorials/factor-analysis). For testing whether two groups score differently on the final scale, see [independent samples t-test with SciPy](/tutorials/independent-samples-t-test-with-scipy).