Network Analysis with NetworkX

NetworkX is a powerful Python library specifically designed for the creation, manipulation, and study of complex networks, also known as graphs. In this tutorial, you'll learn how to use NetworkX to perform network analysis, including creating graphs, adding nodes and edges, and performing basic analysis.

## Creating Graphs

Let's begin by creating an empty graph.

import networkx as nx

# Create an empty graph
G = nx.Graph()

# Display the graph
print(G)
Graph with 0 nodes and 0 edges
### Adding Nodes and Edges

You can add nodes and edges to the graph using various methods provided by NetworkX.

#### Adding Nodes

import networkx as nx

# Create an empty graph
G = nx.Graph()

# Add a single node
G.add_node(1)

# Add multiple nodes
G.add_nodes_from([2, 3, 4])

# Display nodes
print(G.nodes)
[1, 2, 3, 4]
#### Adding Edges

import networkx as nx

# Create an empty graph
G = nx.Graph()

# Add nodes
G.add_nodes_from([1, 2, 3, 4])

# Add a single edge
G.add_edge(1, 2)

# Add multiple edges
G.add_edges_from([(2, 3), (3, 4)])

# Display edges
print(G.edges)
[(1, 2), (2, 3), (3, 4)]
### Visualizing Graphs

NetworkX integrates well with Matplotlib for visualizing graphs. Here's an example of how to draw a simple graph.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1)])

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()
### Basic Network Analysis

Now let's perform some basic network analysis using NetworkX. We'll look at degree distribution, shortest paths, and clustering coefficients.

#### Degree Distribution

The degree of a node is the number of edges connected to it. Here’s how you can find the degree of all nodes in a graph.

import networkx as nx

# Create a graph
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)])

# Compute degree for each node
degree_dict = dict(G.degree())
print("Degree of each node:", degree_dict)
Degree of each node: {1: 3, 2: 2, 3: 3, 4: 2}
#### Shortest Path

NetworkX provides an easy way to find the shortest path between two nodes.

import networkx as nx

# Create a graph
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)])

# Compute the shortest path between node 1 and node 4
shortest_path = nx.shortest_path(G, source=1, target=4)
print("Shortest path from node 1 to node 4:", shortest_path)
Shortest path from node 1 to node 4: [1, 4]
#### Clustering Coefficient

The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together.

import networkx as nx

# Create a graph
G = nx.Graph()
G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)])

# Compute clustering coefficient for each node
clustering_dict = nx.clustering(G)
print("Clustering coefficient of each node:", clustering_dict)
Clustering coefficient of each node: {1: 0.6666666666666666, 2: 1.0, 3: 0.6666666666666666, 4: 1.0}
## Advanced Network Analysis with NetworkX

### PageRank Algorithm

The PageRank algorithm, famously used by Google Search, is used to rank nodes in a graph based on their importance. NetworkX provides a simple way to compute PageRank.

import networkx as nx

# Create a directed graph
G = nx.DiGraph()

# Add edges
edges = [(1, 2), (2, 3), (3, 1), (1, 4), (4, 1), (4, 3)]
G.add_edges_from(edges)

# Compute PageRank
pagerank = nx.pagerank(G)
print("PageRank:", pagerank)
PageRank: {1: 0.35105903526129834, 2: 0.18669946835938991, 3: 0.2755420280199218, 4: 0.18669946835938991}
### Community Detection

Community detection helps identify clusters or groups of nodes that are more densely connected internally than with the rest of the network. NetworkX interfaces with external libraries like `community` to detect communities.

### Using Girvan-Newman Algorithm

import networkx as nx
from networkx.algorithms.community import girvan_newman
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Detect communities
communities = girvan_newman(G)
first_level_communities = next(communities)
community_list = sorted(map(sorted, first_level_communities))

print("Communities:", community_list)
Communities: [[0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21], [2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]]
### Centrality Measures

Centrality measures help identify the most important nodes in a network. Let's look at a few common centrality measures: Degree Centrality, Betweenness Centrality, and Closeness Centrality.

### Degree Centrality

Degree centrality is a simple centrality measure indicating the number of edges connected to a node.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Compute degree centrality
degree_centrality = nx.degree_centrality(G)
print("Degree Centrality:", degree_centrality)
Degree Centrality: {0: 0.48484848484848486, 1: 0.2727272727272727, 2: 0.30303030303030304, 3: 0.18181818181818182, 4: 0.09090909090909091, 5: 0.12121212121212122, 6: 0.12121212121212122, 7: 0.12121212121212122, 8: 0.15151515151515152, 9: 0.06060606060606061, 10: 0.09090909090909091, 11: 0.030303030303030304, 12: 0.06060606060606061, 13: 0.15151515151515152, 14: 0.06060606060606061, 15: 0.06060606060606061, 16: 0.06060606060606061, 17: 0.06060606060606061, 18: 0.06060606060606061, 19: 0.09090909090909091, 20: 0.06060606060606061, 21: 0.06060606060606061, 22: 0.06060606060606061, 23: 0.15151515151515152, 24: 0.09090909090909091, 25: 0.09090909090909091, 26: 0.06060606060606061, 27: 0.12121212121212122, 28: 0.09090909090909091, 29: 0.12121212121212122, 30: 0.12121212121212122, 31: 0.18181818181818182, 32: 0.36363636363636365, 33: 0.5151515151515151}
### Betweenness Centrality

Betweenness centrality measures the extent to which a node lies on paths between other nodes.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Compute betweenness centrality
betweenness_centrality = nx.betweenness_centrality(G)
print("Betweenness Centrality:", betweenness_centrality)
Betweenness Centrality: {0: 0.43763528138528146, 1: 0.053936688311688304, 2: 0.14365680615680618, 3: 0.011909271284271283, 4: 0.0006313131313131313, 5: 0.02998737373737374, 6: 0.029987373737373736, 7: 0.0, 8: 0.05592682780182781, 9: 0.0008477633477633478, 10: 0.0006313131313131313, 11: 0.0, 12: 0.0, 13: 0.04586339586339586, 14: 0.0, 15: 0.0, 16: 0.0, 17: 0.0, 18: 0.0, 19: 0.03247504810004811, 20: 0.0, 21: 0.0, 22: 0.0, 23: 0.017613636363636363, 24: 0.0022095959595959595, 25: 0.0038404882154882154, 26: 0.0, 27: 0.02233345358345358, 28: 0.0017947330447330447, 29: 0.0029220779220779218, 30: 0.014411976911976909, 31: 0.13827561327561325, 32: 0.145247113997114, 33: 0.30407497594997596}
### Closeness Centrality

Closeness centrality measures the average length of the shortest path from a node to all other nodes in the network.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Compute closeness centrality
closeness_centrality = nx.closeness_centrality(G)
print("Closeness Centrality:", closeness_centrality)
Closeness Centrality: {0: 0.5689655172413793, 1: 0.4852941176470588, 2: 0.559322033898305, 3: 0.4647887323943662, 4: 0.3793103448275862, 5: 0.38372093023255816, 6: 0.38372093023255816, 7: 0.44, 8: 0.515625, 9: 0.4342105263157895, 10: 0.3793103448275862, 11: 0.36666666666666664, 12: 0.3707865168539326, 13: 0.515625, 14: 0.3707865168539326, 15: 0.3707865168539326, 16: 0.28448275862068967, 17: 0.375, 18: 0.3707865168539326, 19: 0.5, 20: 0.3707865168539326, 21: 0.375, 22: 0.3707865168539326, 23: 0.39285714285714285, 24: 0.375, 25: 0.375, 26: 0.3626373626373626, 27: 0.4583333333333333, 28: 0.4520547945205479, 29: 0.38372093023255816, 30: 0.4583333333333333, 31: 0.5409836065573771, 32: 0.515625, 33: 0.55}
### Network Traversal

Network traversal algorithms, such as Breadth-First Search (BFS) and Depth-First Search (DFS), are fundamental techniques for exploring graphs.

### Breadth-First Search (BFS)

BFS explores nodes level by level, starting from a source node.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Perform BFS traversal from node 0
bfs_edges = list(nx.bfs_edges(G, source=0))
print("BFS Edges:", bfs_edges)
BFS Edges: [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 10), (0, 11), (0, 12), (0, 13), (0, 17), (0, 19), (0, 21), (0, 31), (1, 30), (2, 9), (2, 27), (2, 28), (2, 32), (5, 16), (8, 33), (31, 24), (31, 25), (27, 23), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 29), (33, 26)]
### Depth-First Search (DFS)

DFS explores as far down a branch as possible before backtracking.

import networkx as nx
import matplotlib.pyplot as plt

# Create a graph
G = nx.karate_club_graph()

# Draw the graph
nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold')
plt.show()

# Perform DFS traversal from node 0
dfs_edges = list(nx.dfs_edges(G, source=0))
print("DFS Edges:", dfs_edges)
DFS Edges: [(0, 1), (1, 2), (2, 3), (3, 7), (3, 12), (3, 13), (13, 33), (33, 8), (8, 30), (30, 32), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 23), (23, 25), (25, 24), (24, 27), (24, 31), (31, 28), (23, 29), (29, 26), (33, 9), (33, 19), (1, 17), (1, 21), (0, 4), (4, 6), (6, 5), (5, 10), (5, 16), (0, 11)]