Citation networks are directed graphs where nodes represent papers and edges represent citations. If Paper B cites Paper A, the graph contains a directed edge from B to A. This structure makes it possible to study which papers are influential, which ones connect areas of research, and how citation patterns develop across a set of publications. This tutorial shows how to build a citation network in NetworkX, inspect in-degree and out-degree, compute PageRank, and visualize the results. ### Building a Citation Network Let's start with a small directed graph representing a set of papers and their citation relationships.
import networkx as nx
G = nx.DiGraph()
papers = [
"Paper A",
"Paper B",
"Paper C",
"Paper D",
"Paper E",
]
G.add_nodes_from(papers)
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
]
)
print("Nodes:", list(G.nodes()))
print("Edges:", list(G.edges()))Nodes: ['Paper A', 'Paper B', 'Paper C', 'Paper D', 'Paper E']
Edges: [('Paper B', 'Paper A'), ('Paper C', 'Paper A'), ('Paper D', 'Paper B'), ('Paper D', 'Paper C'), ('Paper E', 'Paper C')]
- The graph is directed because citations have direction. - An edge points from the citing paper to the cited paper. ### Measuring In-Degree and Out-Degree In a citation network, in-degree is often the simplest citation count. Out-degree shows how many references a paper makes.
import networkx as nx
G = nx.DiGraph()
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
("Paper F", "Paper C"),
]
)
print("Citation counts (in-degree):")
for paper, citations in sorted(G.in_degree(), key=lambda x: x[1], reverse=True):
print(f"{paper}: {citations}")
print("\nReferences made (out-degree):")
for paper, refs in sorted(G.out_degree(), key=lambda x: x[1], reverse=True):
print(f"{paper}: {refs}")Citation counts (in-degree): Paper C: 3 Paper A: 2 Paper B: 1 Paper D: 0 Paper E: 0 Paper F: 0 References made (out-degree): Paper D: 2 Paper B: 1 Paper C: 1 Paper E: 1 Paper F: 1 Paper A: 0
- **`G.in_degree()`** counts citations received. - **`G.out_degree()`** counts citations made by each paper. ### Ranking Papers with PageRank Citation count alone does not tell the whole story. A citation from an influential paper may matter more than a citation from an isolated one. PageRank helps capture that idea.
import networkx as nx
G = nx.DiGraph()
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
("Paper F", "Paper C"),
("Paper G", "Paper A"),
("Paper G", "Paper C"),
]
)
pagerank = nx.pagerank(G)
print("PageRank scores:")
for paper, score in sorted(pagerank.items(), key=lambda x: x[1], reverse=True):
print(f"{paper}: {score:.3f}")PageRank scores: Paper A: 0.386 Paper C: 0.243 Paper B: 0.097 Paper D: 0.068 Paper E: 0.068 Paper F: 0.068 Paper G: 0.068
- **`nx.pagerank(G)`** scores nodes based on link structure, not just raw counts. - In citation networks, this can highlight influential papers that are cited by other important papers. ### Visualizing the Citation Network A graph view helps show which papers are central and how citations are distributed across the network.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
("Paper F", "Paper C"),
("Paper G", "Paper A"),
("Paper G", "Paper C"),
]
)
in_degree = dict(G.in_degree())
node_sizes = [500 + 600 * in_degree[node] for node in G.nodes()]
pos = nx.spring_layout(G, seed=42)
plt.figure(figsize=(10, 7))
nx.draw(
G,
pos,
with_labels=True,
node_size=node_sizes,
node_color="lightblue",
edge_color="gray",
arrows=True,
arrowsize=18,
font_size=10,
)
plt.title("Citation Network with Node Size by Citation Count")
plt.axis("off")
plt.show()- Larger nodes represent papers with more citations. - Arrow direction makes the citation flow visible. ### Coloring by Influence You can also color nodes by PageRank to highlight papers that are structurally important in the citation network.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
("Paper F", "Paper C"),
("Paper G", "Paper A"),
("Paper G", "Paper C"),
("Paper H", "Paper G"),
("Paper I", "Paper G"),
]
)
pagerank = nx.pagerank(G)
node_colors = [pagerank[node] for node in G.nodes()]
node_sizes = [4000 * pagerank[node] + 400 for node in G.nodes()]
pos = nx.spring_layout(G, seed=12)
plt.figure(figsize=(10, 7))
nx.draw(
G,
pos,
with_labels=True,
node_color=node_colors,
node_size=node_sizes,
cmap=plt.cm.plasma,
edge_color="#BBBBBB",
arrows=True,
arrowsize=18,
font_size=10,
)
plt.title("Citation Network Colored by PageRank")
plt.axis("off")
plt.show()- Stronger colors and larger node sizes indicate higher PageRank. - This helps separate merely cited papers from structurally influential ones. ### Practical Example: Finding Influential Papers Here is a more realistic example that combines publication year, citation count, and PageRank.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
paper_years = {
"Paper A": 2018,
"Paper B": 2019,
"Paper C": 2019,
"Paper D": 2020,
"Paper E": 2020,
"Paper F": 2021,
"Paper G": 2021,
"Paper H": 2022,
}
for paper, year in paper_years.items():
G.add_node(paper, year=year)
G.add_edges_from(
[
("Paper B", "Paper A"),
("Paper C", "Paper A"),
("Paper D", "Paper B"),
("Paper D", "Paper C"),
("Paper E", "Paper C"),
("Paper F", "Paper C"),
("Paper F", "Paper A"),
("Paper G", "Paper D"),
("Paper G", "Paper C"),
("Paper H", "Paper D"),
("Paper H", "Paper F"),
]
)
in_degree = dict(G.in_degree())
pagerank = nx.pagerank(G)
print("Top cited papers:")
for paper, citations in sorted(in_degree.items(), key=lambda x: x[1], reverse=True)[:3]:
print(f"{paper}: {citations} citations")
print("\nTop papers by PageRank:")
for paper, score in sorted(pagerank.items(), key=lambda x: x[1], reverse=True)[:3]:
print(f"{paper}: {score:.3f}")
pos = nx.spring_layout(G, seed=21)
node_sizes = [600 + 500 * in_degree[node] for node in G.nodes()]
node_colors = [G.nodes[node]["year"] for node in G.nodes()]
fig, ax = plt.subplots(figsize=(11, 8))
nx.draw(
G,
pos,
with_labels=True,
node_size=node_sizes,
node_color=node_colors,
cmap=plt.cm.viridis,
edge_color="#AAAAAA",
arrows=True,
arrowsize=18,
font_size=10,
ax=ax,
)
sm = plt.cm.ScalarMappable(
cmap=plt.cm.viridis,
norm=plt.Normalize(vmin=min(paper_years.values()), vmax=max(paper_years.values())),
)
sm.set_array([])
fig.colorbar(sm, ax=ax, label="Publication Year")
ax.set_title("Citation Network with Size by Citations and Color by Year")
ax.axis("off")
plt.show()Top cited papers: Paper C: 4 citations Paper A: 3 citations Paper D: 2 citations Top papers by PageRank: Paper A: 0.347 Paper C: 0.204 Paper D: 0.103
- Node size highlights papers with more citations. - Node color adds temporal context, helping you see whether influential papers are older or more recent. - Together, citation count and PageRank give a better picture of importance than either metric alone. ### Conclusion NetworkX is well suited to citation network analysis because it handles directed graphs, ranking methods, and graph visualizations in a compact workflow. By combining in-degree, PageRank, and visualization, you can identify influential papers and better understand the structure of a citation graph.