PageRank was the original algorithm behind Google's search engine: it scores web pages by the probability that a random surfer clicking links at random will land on that page. A page is important not just because many pages link to it, but because high-ranking pages link to it. The same idea applies to any directed graph: finding influential researchers in citation networks, identifying key nodes in social graphs, or ranking players in a tournament. `nx.pagerank` implements this in one line. ### Computing PageRank `nx.pagerank` runs the power iteration algorithm on a directed graph and returns a score for each node between 0 and 1 (scores sum to 1). Nodes with many incoming links from high-scoring nodes get the highest scores.
import networkx as nx
# Small web graph: 6 pages with links
G = nx.DiGraph()
G.add_edges_from([
(0, 1), (0, 2),
(1, 2),
(2, 0),
(3, 2),
(4, 3), (4, 2),
(5, 4),
])
pr = nx.pagerank(G, alpha=0.85)
print("PageRank scores:")
for node in sorted(pr, key=pr.get, reverse=True):
print(f" Node {node}: {pr[node]:.4f}")- `alpha=0.85` is the damping factor: with probability 0.85, the random surfer follows a link; with probability 0.15, they teleport to a random node. The default is 0.85. - Node 2 receives links from nodes 0, 1, 3, and 4 — it will have the highest PageRank. - Scores sum to 1.0 across all nodes; interpret them as the fraction of time a random surfer spends at each node. ### Visualizing PageRank Drawing nodes sized by their PageRank score makes the most important nodes immediately visible.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from([
(0, 1), (0, 2),
(1, 2),
(2, 0),
(3, 2),
(4, 3), (4, 2),
(5, 4),
])
pr = nx.pagerank(G, alpha=0.85)
pos = nx.spring_layout(G, seed=42)
sizes = [pr[n] * 15000 for n in G.nodes()]
colors = [pr[n] for n in G.nodes()]
plt.figure(figsize=(7, 6))
nodes = nx.draw_networkx_nodes(G, pos, node_size=sizes, node_color=colors,
cmap="YlOrRd", vmin=0)
nx.draw_networkx_labels(G, pos, font_size=10, font_color="black")
nx.draw_networkx_edges(G, pos, edge_color="gray", arrows=True,
arrowsize=20, width=1.5,
connectionstyle="arc3,rad=0.1")
plt.colorbar(nodes, label="PageRank score")
plt.title("PageRank (node size and color ∝ score)")
plt.axis("off")
plt.tight_layout()
plt.show()- `connectionstyle="arc3,rad=0.1"` curves the edges slightly so that bidirectional edges (like 0→2 and 2→0) don't overlap. - `plt.colorbar` requires passing the return value of `draw_networkx_nodes` — it's a `ScalarMappable` that matplotlib can read the colormap from. - Node 2 appears largest and darkest, confirming its high PageRank despite having only one outgoing edge. ### The Damping Factor The damping factor `alpha` controls how much weight is given to the link structure vs uniform random jumps. Low alpha → scores are more uniform; high alpha → scores are more concentrated on well-linked nodes.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from([
(0, 1), (0, 2), (1, 2), (2, 0),
(3, 2), (4, 3), (4, 2), (5, 4),
])
alphas = [0.5, 0.85, 0.99]
fig, axes = plt.subplots(1, 3, figsize=(13, 4))
pos = nx.spring_layout(G, seed=42)
for ax, alpha in zip(axes, alphas):
pr = nx.pagerank(G, alpha=alpha)
sizes = [pr[n] * 15000 for n in G.nodes()]
colors = [pr[n] for n in G.nodes()]
nx.draw(G, pos, ax=ax, node_size=sizes, node_color=colors,
cmap="YlOrRd", with_labels=True, arrows=True,
edge_color="gray", connectionstyle="arc3,rad=0.1")
ax.set_title(f"alpha = {alpha}")
plt.suptitle("Effect of Damping Factor on PageRank", y=1.02)
plt.tight_layout()
plt.show()- At `alpha=0.5`, node 2 is still the largest but the difference between nodes is small — teleportation dominates. - At `alpha=0.99`, almost all weight flows through the link structure and node 2 becomes much more prominent relative to the others. - The default `alpha=0.85` was chosen empirically to balance these extremes for web-scale graphs. ### Personalized PageRank Personalized PageRank lets you bias the teleportation toward specific seed nodes, finding which nodes are most influential *from the perspective of a particular starting point*.
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([
(0, 1), (0, 2), (1, 2), (2, 0),
(3, 2), (4, 3), (4, 2), (5, 4),
])
# Seed from node 5: which nodes are most reachable from 5?
personalization = {n: (1.0 if n == 5 else 0.0) for n in G.nodes()}
pr_personalized = nx.pagerank(G, alpha=0.85, personalization=personalization)
pr_standard = nx.pagerank(G, alpha=0.85)
print("Node Standard Personalized (seed=5)")
for n in sorted(G.nodes()):
print(f" {n} {pr_standard[n]:.4f} {pr_personalized[n]:.4f}")- `personalization` is a dictionary of node → probability for the teleportation distribution; it must sum to 1.0. - From node 5's perspective, node 4 (which 5 links to directly) and node 2 (downstream of 4) receive much higher scores than in the standard ranking. - Personalized PageRank powers collaborative filtering in recommendation systems: "given this user's history, what nodes are most relevant to them?" PageRank is one lens for identifying important nodes. For complementary node importance measures, see [centrality measures with NetworkX](/tutorials/centrality-measures-with-networkx).