Citation Network Analysis with NetworkX

Citation networks are directed graphs where nodes represent papers and edges represent citations. If Paper B cites Paper A, the graph contains a directed edge from B to A. This structure makes it possible to study which papers are influential, which ones connect areas of research, and how citation patterns develop across a set of publications.

This tutorial shows how to build a citation network in NetworkX, inspect in-degree and out-degree, compute PageRank, and visualize the results.

### Building a Citation Network

Let's start with a small directed graph representing a set of papers and their citation relationships.

import networkx as nx

G = nx.DiGraph()

papers = [
    "Paper A",
    "Paper B",
    "Paper C",
    "Paper D",
    "Paper E",
]

G.add_nodes_from(papers)
G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
    ]
)

print("Nodes:", list(G.nodes()))
print("Edges:", list(G.edges()))

Nodes: ['Paper A', 'Paper B', 'Paper C', 'Paper D', 'Paper E']
Edges: [('Paper B', 'Paper A'), ('Paper C', 'Paper A'), ('Paper D', 'Paper B'), ('Paper D', 'Paper C'), ('Paper E', 'Paper C')]

- The graph is directed because citations have direction.
- An edge points from the citing paper to the cited paper.

### Measuring In-Degree and Out-Degree

In a citation network, in-degree is often the simplest citation count. Out-degree shows how many references a paper makes.

import networkx as nx

G = nx.DiGraph()
G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
        ("Paper F", "Paper C"),
    ]
)

print("Citation counts (in-degree):")
for paper, citations in sorted(G.in_degree(), key=lambda x: x[1], reverse=True):
    print(f"{paper}: {citations}")

print("\nReferences made (out-degree):")
for paper, refs in sorted(G.out_degree(), key=lambda x: x[1], reverse=True):
    print(f"{paper}: {refs}")

Citation counts (in-degree):
Paper C: 3
Paper A: 2
Paper B: 1
Paper D: 0
Paper E: 0
Paper F: 0

References made (out-degree):
Paper D: 2
Paper B: 1
Paper C: 1
Paper E: 1
Paper F: 1
Paper A: 0

- **`G.in_degree()`** counts citations received.
- **`G.out_degree()`** counts citations made by each paper.

### Ranking Papers with PageRank

Citation count alone does not tell the whole story. A citation from an influential paper may matter more than a citation from an isolated one. PageRank helps capture that idea.

import networkx as nx

G = nx.DiGraph()
G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
        ("Paper F", "Paper C"),
        ("Paper G", "Paper A"),
        ("Paper G", "Paper C"),
    ]
)

pagerank = nx.pagerank(G)

print("PageRank scores:")
for paper, score in sorted(pagerank.items(), key=lambda x: x[1], reverse=True):
    print(f"{paper}: {score:.3f}")

PageRank scores:
Paper A: 0.386
Paper C: 0.243
Paper B: 0.097
Paper D: 0.068
Paper E: 0.068
Paper F: 0.068
Paper G: 0.068

- **`nx.pagerank(G)`** scores nodes based on link structure, not just raw counts.
- In citation networks, this can highlight influential papers that are cited by other important papers.

### Visualizing the Citation Network

A graph view helps show which papers are central and how citations are distributed across the network.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
        ("Paper F", "Paper C"),
        ("Paper G", "Paper A"),
        ("Paper G", "Paper C"),
    ]
)

in_degree = dict(G.in_degree())
node_sizes = [500 + 600 * in_degree[node] for node in G.nodes()]

pos = nx.spring_layout(G, seed=42)

plt.figure(figsize=(10, 7))
nx.draw(
    G,
    pos,
    with_labels=True,
    node_size=node_sizes,
    node_color="lightblue",
    edge_color="gray",
    arrows=True,
    arrowsize=18,
    font_size=10,
)
plt.title("Citation Network with Node Size by Citation Count")
plt.axis("off")
plt.show()

- Larger nodes represent papers with more citations.
- Arrow direction makes the citation flow visible.

### Coloring by Influence

You can also color nodes by PageRank to highlight papers that are structurally important in the citation network.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
        ("Paper F", "Paper C"),
        ("Paper G", "Paper A"),
        ("Paper G", "Paper C"),
        ("Paper H", "Paper G"),
        ("Paper I", "Paper G"),
    ]
)

pagerank = nx.pagerank(G)
node_colors = [pagerank[node] for node in G.nodes()]
node_sizes = [4000 * pagerank[node] + 400 for node in G.nodes()]

pos = nx.spring_layout(G, seed=12)

plt.figure(figsize=(10, 7))
nx.draw(
    G,
    pos,
    with_labels=True,
    node_color=node_colors,
    node_size=node_sizes,
    cmap=plt.cm.plasma,
    edge_color="#BBBBBB",
    arrows=True,
    arrowsize=18,
    font_size=10,
)
plt.title("Citation Network Colored by PageRank")
plt.axis("off")
plt.show()

- Stronger colors and larger node sizes indicate higher PageRank.
- This helps separate merely cited papers from structurally influential ones.

### Practical Example: Finding Influential Papers

Here is a more realistic example that combines publication year, citation count, and PageRank.

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()

paper_years = {
    "Paper A": 2018,
    "Paper B": 2019,
    "Paper C": 2019,
    "Paper D": 2020,
    "Paper E": 2020,
    "Paper F": 2021,
    "Paper G": 2021,
    "Paper H": 2022,
}

for paper, year in paper_years.items():
    G.add_node(paper, year=year)

G.add_edges_from(
    [
        ("Paper B", "Paper A"),
        ("Paper C", "Paper A"),
        ("Paper D", "Paper B"),
        ("Paper D", "Paper C"),
        ("Paper E", "Paper C"),
        ("Paper F", "Paper C"),
        ("Paper F", "Paper A"),
        ("Paper G", "Paper D"),
        ("Paper G", "Paper C"),
        ("Paper H", "Paper D"),
        ("Paper H", "Paper F"),
    ]
)

in_degree = dict(G.in_degree())
pagerank = nx.pagerank(G)

print("Top cited papers:")
for paper, citations in sorted(in_degree.items(), key=lambda x: x[1], reverse=True)[:3]:
    print(f"{paper}: {citations} citations")

print("\nTop papers by PageRank:")
for paper, score in sorted(pagerank.items(), key=lambda x: x[1], reverse=True)[:3]:
    print(f"{paper}: {score:.3f}")

pos = nx.spring_layout(G, seed=21)
node_sizes = [600 + 500 * in_degree[node] for node in G.nodes()]
node_colors = [G.nodes[node]["year"] for node in G.nodes()]

fig, ax = plt.subplots(figsize=(11, 8))
nx.draw(
    G,
    pos,
    with_labels=True,
    node_size=node_sizes,
    node_color=node_colors,
    cmap=plt.cm.viridis,
    edge_color="#AAAAAA",
    arrows=True,
    arrowsize=18,
    font_size=10,
    ax=ax,
)

sm = plt.cm.ScalarMappable(
    cmap=plt.cm.viridis,
    norm=plt.Normalize(vmin=min(paper_years.values()), vmax=max(paper_years.values())),
)
sm.set_array([])
fig.colorbar(sm, ax=ax, label="Publication Year")

ax.set_title("Citation Network with Size by Citations and Color by Year")
ax.axis("off")
plt.show()

Top cited papers:
Paper C: 4 citations
Paper A: 3 citations
Paper D: 2 citations

Top papers by PageRank:
Paper A: 0.347
Paper C: 0.204
Paper D: 0.103

- Node size highlights papers with more citations.
- Node color adds temporal context, helping you see whether influential papers are older or more recent.
- Together, citation count and PageRank give a better picture of importance than either metric alone.

### Conclusion

NetworkX is well suited to citation network analysis because it handles directed graphs, ranking methods, and graph visualizations in a compact workflow. By combining in-degree, PageRank, and visualization, you can identify influential papers and better understand the structure of a citation graph.