NetworkX is a powerful Python library specifically designed for the creation, manipulation, and study of complex networks, also known as graphs. In this tutorial, you'll learn how to use NetworkX to perform network analysis, including creating graphs, adding nodes and edges, and performing basic analysis. ## Creating Graphs Let's begin by creating an empty graph.
import networkx as nx # Create an empty graph G = nx.Graph() # Display the graph print(G)
Graph with 0 nodes and 0 edges
### Adding Nodes and Edges You can add nodes and edges to the graph using various methods provided by NetworkX. #### Adding Nodes
import networkx as nx # Create an empty graph G = nx.Graph() # Add a single node G.add_node(1) # Add multiple nodes G.add_nodes_from([2, 3, 4]) # Display nodes print(G.nodes)
[1, 2, 3, 4]
#### Adding Edges
import networkx as nx # Create an empty graph G = nx.Graph() # Add nodes G.add_nodes_from([1, 2, 3, 4]) # Add a single edge G.add_edge(1, 2) # Add multiple edges G.add_edges_from([(2, 3), (3, 4)]) # Display edges print(G.edges)
[(1, 2), (2, 3), (3, 4)]
### Visualizing Graphs NetworkX integrates well with Matplotlib for visualizing graphs. Here's an example of how to draw a simple graph.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.Graph() G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1)]) # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show()
### Basic Network Analysis Now let's perform some basic network analysis using NetworkX. We'll look at degree distribution, shortest paths, and clustering coefficients. #### Degree Distribution The degree of a node is the number of edges connected to it. Here’s how you can find the degree of all nodes in a graph.
import networkx as nx # Create a graph G = nx.Graph() G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)]) # Compute degree for each node degree_dict = dict(G.degree()) print("Degree of each node:", degree_dict)
Degree of each node: {1: 3, 2: 2, 3: 3, 4: 2}
#### Shortest Path NetworkX provides an easy way to find the shortest path between two nodes.
import networkx as nx # Create a graph G = nx.Graph() G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)]) # Compute the shortest path between node 1 and node 4 shortest_path = nx.shortest_path(G, source=1, target=4) print("Shortest path from node 1 to node 4:", shortest_path)
Shortest path from node 1 to node 4: [1, 4]
#### Clustering Coefficient The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together.
import networkx as nx # Create a graph G = nx.Graph() G.add_edges_from([(1, 2), (2, 3), (3, 4), (4, 1), (1, 3)]) # Compute clustering coefficient for each node clustering_dict = nx.clustering(G) print("Clustering coefficient of each node:", clustering_dict)
Clustering coefficient of each node: {1: 0.6666666666666666, 2: 1.0, 3: 0.6666666666666666, 4: 1.0}
## Advanced Network Analysis with NetworkX ### PageRank Algorithm The PageRank algorithm, famously used by Google Search, is used to rank nodes in a graph based on their importance. NetworkX provides a simple way to compute PageRank.
import networkx as nx # Create a directed graph G = nx.DiGraph() # Add edges edges = [(1, 2), (2, 3), (3, 1), (1, 4), (4, 1), (4, 3)] G.add_edges_from(edges) # Compute PageRank pagerank = nx.pagerank(G) print("PageRank:", pagerank)
PageRank: {1: 0.35105903526129834, 2: 0.18669946835938991, 3: 0.2755420280199218, 4: 0.18669946835938991}
### Community Detection Community detection helps identify clusters or groups of nodes that are more densely connected internally than with the rest of the network. NetworkX interfaces with external libraries like `community` to detect communities. ### Using Girvan-Newman Algorithm
import networkx as nx from networkx.algorithms.community import girvan_newman import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Detect communities communities = girvan_newman(G) first_level_communities = next(communities) community_list = sorted(map(sorted, first_level_communities)) print("Communities:", community_list)
Communities: [[0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 17, 19, 21], [2, 8, 9, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]]
### Centrality Measures Centrality measures help identify the most important nodes in a network. Let's look at a few common centrality measures: Degree Centrality, Betweenness Centrality, and Closeness Centrality. ### Degree Centrality Degree centrality is a simple centrality measure indicating the number of edges connected to a node.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Compute degree centrality degree_centrality = nx.degree_centrality(G) print("Degree Centrality:", degree_centrality)
Degree Centrality: {0: 0.48484848484848486, 1: 0.2727272727272727, 2: 0.30303030303030304, 3: 0.18181818181818182, 4: 0.09090909090909091, 5: 0.12121212121212122, 6: 0.12121212121212122, 7: 0.12121212121212122, 8: 0.15151515151515152, 9: 0.06060606060606061, 10: 0.09090909090909091, 11: 0.030303030303030304, 12: 0.06060606060606061, 13: 0.15151515151515152, 14: 0.06060606060606061, 15: 0.06060606060606061, 16: 0.06060606060606061, 17: 0.06060606060606061, 18: 0.06060606060606061, 19: 0.09090909090909091, 20: 0.06060606060606061, 21: 0.06060606060606061, 22: 0.06060606060606061, 23: 0.15151515151515152, 24: 0.09090909090909091, 25: 0.09090909090909091, 26: 0.06060606060606061, 27: 0.12121212121212122, 28: 0.09090909090909091, 29: 0.12121212121212122, 30: 0.12121212121212122, 31: 0.18181818181818182, 32: 0.36363636363636365, 33: 0.5151515151515151}
### Betweenness Centrality Betweenness centrality measures the extent to which a node lies on paths between other nodes.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Compute betweenness centrality betweenness_centrality = nx.betweenness_centrality(G) print("Betweenness Centrality:", betweenness_centrality)
Betweenness Centrality: {0: 0.43763528138528146, 1: 0.053936688311688304, 2: 0.14365680615680618, 3: 0.011909271284271283, 4: 0.0006313131313131313, 5: 0.02998737373737374, 6: 0.029987373737373736, 7: 0.0, 8: 0.05592682780182781, 9: 0.0008477633477633478, 10: 0.0006313131313131313, 11: 0.0, 12: 0.0, 13: 0.04586339586339586, 14: 0.0, 15: 0.0, 16: 0.0, 17: 0.0, 18: 0.0, 19: 0.03247504810004811, 20: 0.0, 21: 0.0, 22: 0.0, 23: 0.017613636363636363, 24: 0.0022095959595959595, 25: 0.0038404882154882154, 26: 0.0, 27: 0.02233345358345358, 28: 0.0017947330447330447, 29: 0.0029220779220779218, 30: 0.014411976911976909, 31: 0.13827561327561325, 32: 0.145247113997114, 33: 0.30407497594997596}
### Closeness Centrality Closeness centrality measures the average length of the shortest path from a node to all other nodes in the network.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Compute closeness centrality closeness_centrality = nx.closeness_centrality(G) print("Closeness Centrality:", closeness_centrality)
Closeness Centrality: {0: 0.5689655172413793, 1: 0.4852941176470588, 2: 0.559322033898305, 3: 0.4647887323943662, 4: 0.3793103448275862, 5: 0.38372093023255816, 6: 0.38372093023255816, 7: 0.44, 8: 0.515625, 9: 0.4342105263157895, 10: 0.3793103448275862, 11: 0.36666666666666664, 12: 0.3707865168539326, 13: 0.515625, 14: 0.3707865168539326, 15: 0.3707865168539326, 16: 0.28448275862068967, 17: 0.375, 18: 0.3707865168539326, 19: 0.5, 20: 0.3707865168539326, 21: 0.375, 22: 0.3707865168539326, 23: 0.39285714285714285, 24: 0.375, 25: 0.375, 26: 0.3626373626373626, 27: 0.4583333333333333, 28: 0.4520547945205479, 29: 0.38372093023255816, 30: 0.4583333333333333, 31: 0.5409836065573771, 32: 0.515625, 33: 0.55}
### Network Traversal Network traversal algorithms, such as Breadth-First Search (BFS) and Depth-First Search (DFS), are fundamental techniques for exploring graphs. ### Breadth-First Search (BFS) BFS explores nodes level by level, starting from a source node.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Perform BFS traversal from node 0 bfs_edges = list(nx.bfs_edges(G, source=0)) print("BFS Edges:", bfs_edges)
BFS Edges: [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 10), (0, 11), (0, 12), (0, 13), (0, 17), (0, 19), (0, 21), (0, 31), (1, 30), (2, 9), (2, 27), (2, 28), (2, 32), (5, 16), (8, 33), (31, 24), (31, 25), (27, 23), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 29), (33, 26)]
### Depth-First Search (DFS) DFS explores as far down a branch as possible before backtracking.
import networkx as nx import matplotlib.pyplot as plt # Create a graph G = nx.karate_club_graph() # Draw the graph nx.draw(G, with_labels=True, node_color='lightblue', font_weight='bold') plt.show() # Perform DFS traversal from node 0 dfs_edges = list(nx.dfs_edges(G, source=0)) print("DFS Edges:", dfs_edges)
DFS Edges: [(0, 1), (1, 2), (2, 3), (3, 7), (3, 12), (3, 13), (13, 33), (33, 8), (8, 30), (30, 32), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 23), (23, 25), (25, 24), (24, 27), (24, 31), (31, 28), (23, 29), (29, 26), (33, 9), (33, 19), (1, 17), (1, 21), (0, 4), (4, 6), (6, 5), (5, 10), (5, 16), (0, 11)]