Tutorials

PageRank with NetworkX

PageRank was the original algorithm behind Google's search engine: it scores web pages by the probability that a random surfer clicking links at random will land on that page. A page is important not just because many pages link to it, but because high-ranking pages link to it. The same idea applies to any directed graph: finding influential researchers in citation networks, identifying key nodes in social graphs, or ranking players in a tournament. `nx.pagerank` implements this in one line.

### Computing PageRank

`nx.pagerank` runs the power iteration algorithm on a directed graph and returns a score for each node between 0 and 1 (scores sum to 1). Nodes with many incoming links from high-scoring nodes get the highest scores.

import networkx as nx

# Small web graph: 6 pages with links
G = nx.DiGraph()
G.add_edges_from([
    (0, 1), (0, 2),
    (1, 2),
    (2, 0),
    (3, 2),
    (4, 3), (4, 2),
    (5, 4),
])

pr = nx.pagerank(G, alpha=0.85)
print("PageRank scores:")
for node in sorted(pr, key=pr.get, reverse=True):
    print(f"  Node {node}: {pr[node]:.4f}")
PageRank scores:
  Node 2: 0.3724
  Node 0: 0.3415
  Node 1: 0.1702
  Node 4: 0.0462
  Node 3: 0.0447
  Node 5: 0.0250
- `alpha=0.85` is the damping factor: with probability 0.85, the random surfer follows a link; with probability 0.15, they teleport to a random node. The default is 0.85.
- Node 2 receives links from nodes 0, 1, 3, and 4 — it will have the highest PageRank.
- Scores sum to 1.0 across all nodes; interpret them as the fraction of time a random surfer spends at each node.

### Visualizing PageRank

Drawing nodes sized by their PageRank score makes the most important nodes immediately visible.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_edges_from([
    (0, 1), (0, 2),
    (1, 2),
    (2, 0),
    (3, 2),
    (4, 3), (4, 2),
    (5, 4),
])

pr = nx.pagerank(G, alpha=0.85)
pos = nx.spring_layout(G, seed=42)

sizes  = [pr[n] * 15000 for n in G.nodes()]
colors = [pr[n] for n in G.nodes()]

plt.figure(figsize=(7, 6))
nodes = nx.draw_networkx_nodes(G, pos, node_size=sizes, node_color=colors,
                                cmap="YlOrRd", vmin=0)
nx.draw_networkx_labels(G, pos, font_size=10, font_color="black")
nx.draw_networkx_edges(G, pos, edge_color="gray", arrows=True,
                       arrowsize=20, width=1.5,
                       connectionstyle="arc3,rad=0.1")
plt.colorbar(nodes, label="PageRank score")
plt.title("PageRank (node size and color ∝ score)")
plt.axis("off")
plt.tight_layout()
plt.show()
- `connectionstyle="arc3,rad=0.1"` curves the edges slightly so that bidirectional edges (like 0→2 and 2→0) don't overlap.
- `plt.colorbar` requires passing the return value of `draw_networkx_nodes` — it's a `ScalarMappable` that matplotlib can read the colormap from.
- Node 2 appears largest and darkest, confirming its high PageRank despite having only one outgoing edge.

### The Damping Factor

The damping factor `alpha` controls how much weight is given to the link structure vs uniform random jumps. Low alpha → scores are more uniform; high alpha → scores are more concentrated on well-linked nodes.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_edges_from([
    (0, 1), (0, 2), (1, 2), (2, 0),
    (3, 2), (4, 3), (4, 2), (5, 4),
])

alphas = [0.5, 0.85, 0.99]
fig, axes = plt.subplots(1, 3, figsize=(13, 4))
pos = nx.spring_layout(G, seed=42)

for ax, alpha in zip(axes, alphas):
    pr = nx.pagerank(G, alpha=alpha)
    sizes  = [pr[n] * 15000 for n in G.nodes()]
    colors = [pr[n] for n in G.nodes()]
    nx.draw(G, pos, ax=ax, node_size=sizes, node_color=colors,
            cmap="YlOrRd", with_labels=True, arrows=True,
            edge_color="gray", connectionstyle="arc3,rad=0.1")
    ax.set_title(f"alpha = {alpha}")

plt.suptitle("Effect of Damping Factor on PageRank", y=1.02)
plt.tight_layout()
plt.show()
- At `alpha=0.5`, node 2 is still the largest but the difference between nodes is small — teleportation dominates.
- At `alpha=0.99`, almost all weight flows through the link structure and node 2 becomes much more prominent relative to the others.
- The default `alpha=0.85` was chosen empirically to balance these extremes for web-scale graphs.

### Personalized PageRank

Personalized PageRank lets you bias the teleportation toward specific seed nodes, finding which nodes are most influential *from the perspective of a particular starting point*.

import networkx as nx

G = nx.DiGraph()
G.add_edges_from([
    (0, 1), (0, 2), (1, 2), (2, 0),
    (3, 2), (4, 3), (4, 2), (5, 4),
])

# Seed from node 5: which nodes are most reachable from 5?
personalization = {n: (1.0 if n == 5 else 0.0) for n in G.nodes()}
pr_personalized = nx.pagerank(G, alpha=0.85, personalization=personalization)
pr_standard     = nx.pagerank(G, alpha=0.85)

print("Node  Standard   Personalized (seed=5)")
for n in sorted(G.nodes()):
    print(f"  {n}     {pr_standard[n]:.4f}      {pr_personalized[n]:.4f}")
Node  Standard   Personalized (seed=5)
  0     0.3415      0.2569
  1     0.1702      0.1092
  2     0.3724      0.3022
  3     0.0447      0.0542
  4     0.0462      0.1275
  5     0.0250      0.1500
- `personalization` is a dictionary of node → probability for the teleportation distribution; it must sum to 1.0.
- From node 5's perspective, node 4 (which 5 links to directly) and node 2 (downstream of 4) receive much higher scores than in the standard ranking.
- Personalized PageRank powers collaborative filtering in recommendation systems: "given this user's history, what nodes are most relevant to them?"

PageRank is one lens for identifying important nodes. For complementary node importance measures, see [centrality measures with NetworkX](/tutorials/centrality-measures-with-networkx).