Network Graphs using Python
Network graphs are powerful visual representations that illustrate relationships between entities across various domains. From social connections to biological systems, Python offers robust tools for network analysis and visualization. This guide explores essential concepts, libraries, and techniques for creating insightful network graphs with Python.
Understanding Graph Theory Fundamentals
Graph theory provides the mathematical foundation for network analysis. A graph consists of nodes (vertices) connected by edges (links) that represent relationships.
Types of Graphs:
- Undirected graphs have symmetric relationships (like Facebook friendships)
- Directed graphs (digraphs) have orientations shown by arrows (like Twitter follows)
- Weighted graphs assign values to edges (distances, costs, strengths)
- Unweighted graphs simply show presence/absence of connections
Key network properties include density (ratio of actual to possible connections), connectivity (how well nodes connect), centrality (node importance), and communities (node clusters with dense internal connections).
Mathematically, networks are commonly represented using adjacency matrices, where element (i,j) indicates whether nodes i and j are connected.
Essential Python Libraries for Network Graphs
Several specialized Python libraries facilitate network analysis and visualization:
NetworkX: The cornerstone library providing comprehensive functionality for creation, manipulation, and study of complex networks.
import networkx as nx
# Create a simple graph
G = nx.Graph()
G.add_edges_from([(1, 2), (1, 3), (2, 3), (3, 4)])
Matplotlib: While not strictly a network library, it integrates seamlessly with NetworkX for basic visualization:
import matplotlib.pyplot as plt
nx.draw(G, with_labels=True)
plt.show()
Pyvis: Built on the vis.js JavaScript library, Pyvis creates interactive network visualizations:
from pyvis.network import Network
net = Network()
net.from_nx(G)
net.show("network.html")
Graph-tool: Written in C++ with Python bindings, it offers significantly faster performance for large-scale network analysis.
Getting Started with NetworkX
NetworkX provides an intuitive API with extensive documentation for network analysis.
Installation and Setup:
pip install networkx matplotlib
Creating Graph Objects:
NetworkX offers several graph classes:
# Undirected graph
G = nx.Graph()
# Directed graph
D = nx.DiGraph()
# For parallel edges
M = nx.MultiGraph()
Adding Nodes and Edges:
# Add nodes with attributes
G.add_node(1, role='server')
G.add_nodes_from([2, 3, 4])
# Add weighted edges
G.add_edge(1, 2, weight=0.5)
G.add_edges_from([(1, 3), (2, 3), (3, 4)])
Basic Manipulation:
# Access nodes and edges
print(G.nodes())
print(G.edges())
# Access attributes
print(G.nodes[1]['role'])
print(G.edges[1, 2]['weight'])
Graph Properties:
# Basic metrics
print(nx.number_of_nodes(G))
print(nx.number_of_edges(G))
print(G.degree[1])
print(nx.is_connected(G))
# Find shortest path
print(nx.shortest_path(G, 1, 4))
Building Basic Network Graphs
Let’s create a practical friendship network:
# Create social network
friendship_network = nx.Graph()
# Add people as nodes
people = ["Meilana", "Nadia", "Maria", "David", "Ulfa"]
friendship_network.add_nodes_from(people)
# Add connections
friendships = [
("Meilana", "Nadia"), ("Meilana", "Maria"),
("Nadia", "David"), ("Maria", "David"),
("David", "Ulfa")
]
friendship_network.add_edges_from(friendships)
# Visualize
plt.figure(figsize=(8, 6))
nx.draw_networkx(friendship_network,
node_color='lightblue',
node_size=500)
plt.axis('off')
plt.show()
Converting Data Structures to Graphs:
# From dictionary of lists
connections = {
'A': ['B', 'C'],
'B': ['A', 'D'],
'C': ['A', 'D']
}
G = nx.Graph(connections)
# From pandas DataFrame
import pandas as pd
edges_df = pd.DataFrame({
'source': ['A', 'A', 'B', 'C'],
'target': ['B', 'C', 'D', 'D'],
'weight': [0.5, 0.8, 1.2, 0.7]
})
G = nx.from_pandas_edgelist(edges_df, 'source', 'target', 'weight')
Customizing Network Visualization
Customization makes network graphs more informative and appealing.
Modifying Node Appearance:
G = nx.karate_club_graph()
# Size nodes by degree
node_sizes = [v * 100 for v in dict(G.degree()).values()]
# Color nodes by attribute
node_colors = ['red' if G.nodes[n]['club'] == 'Mr. Hi'
else 'green' for n in G.nodes()]
nx.draw_networkx(G,
node_size=node_sizes,
node_color=node_colors,
with_labels=True)
Customizing Edge Properties:
# Edge width based on weight
edge_weights = [G[u][v]['weight'] for u, v in G.edges()]
normalized_weights = [1 + 2 * (w / max(edge_weights))
for w in edge_weights]
# Draw with custom edges
nx.draw_networkx_edges(G, pos,
width=normalized_weights,
edge_color='gray',
alpha=0.7)
Layout Algorithms:
NetworkX offers various layout algorithms that significantly impact visualization:
layouts = {
"Spring": nx.spring_layout,
"Circular": nx.circular_layout,
"Random": nx.random_layout,
"Shell": nx.shell_layout,
"Spectral": nx.spectral_layout
}
# Compare layouts
plt.figure(figsize=(15, 10))
for i, (name, layout) in enumerate(layouts.items(), 1):
plt.subplot(2, 3, i)
plt.title(name)
pos = layout(G)
nx.draw_networkx(G, pos, node_size=100, font_size=8)
plt.axis('off')
Spring layout works well for general networks, circular layouts highlight cycles, and spectral layouts often reveal community structures.
Data Import and Graph Construction
Real-world network analysis typically starts with importing external data.
From CSV Files:
import pandas as pd
# Load edge data
edges_df = pd.read_csv('network_edges.csv')
G = nx.from_pandas_edgelist(
edges_df,
source='source',
target='target',
edge_attr='weight'
)
# Load node attributes
nodes_df = pd.read_csv('network_nodes.csv')
node_attrs = nodes_df.set_index('id').to_dict('index')
nx.set_node_attributes(G, node_attrs)
From Adjacency Matrix:
import numpy as np
# Adjacency matrix
adj_matrix = np.array([
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
[1, 1, 0, 1, 1],
[0, 1, 1, 0, 1],
[0, 0, 1, 1, 0]
])
G = nx.from_numpy_array(adj_matrix)
Data Cleaning:
# Remove missing data
df = df.dropna(subset=['source', 'target'])
# Fill missing weights
df['weight'] = df['weight'].fillna(1.0)
# Remove self-loops and isolated nodes
G.remove_edges_from(nx.selfloop_edges(G))
G.remove_nodes_from(list(nx.isolates(G)))
Graph Analysis Techniques
NetworkX provides powerful algorithms for network analysis.
Centrality Measures:
# Calculate various centrality metrics
degree_cent = nx.degree_centrality(G)
betweenness_cent = nx.betweenness_centrality(G)
closeness_cent = nx.closeness_centrality(G)
eigenvector_cent = nx.eigenvector_centrality(G)
# Visualize with node size based on centrality
plt.figure(figsize=(10, 8))
node_sizes = [v * 3000 for v in betweenness_cent.values()]
nx.draw_networkx(G, pos, node_size=node_sizes)
Each centrality measure highlights different aspects of importance:
- Degree centrality: Number of connections
- Betweenness centrality: Control over information flow
- Closeness centrality: How quickly a node can reach others
- Eigenvector centrality: Connection to other important nodes
Community Detection:
import community as community_louvain
# Detect communities using Louvain method
partition = community_louvain.best_partition(G)
# Visualize communities
colors = [partition[node] for node in G.nodes()]
nx.draw_networkx(G, pos, node_color=colors, cmap=plt.cm.rainbow)
Path Finding:
# Find shortest path by hops
shortest_path = nx.shortest_path(G, source=1, target=6)
# Find shortest path by edge weight
weighted_path = nx.dijkstra_path(G, source=1, target=6)
Interactive Network Visualization with Pyvis
Interactive visualizations allow users to explore complex networks dynamically.
from pyvis.network import Network
# Create interactive network
net = Network(height="700px", width="100%", bgcolor="#222222", font_color="white")
# Set physics options
net.barnes_hut(gravity=-80000, central_gravity=0.3, spring_length=250)
# Convert from NetworkX
net.from_nx(G)
# Customize nodes
for node in net.nodes:
node["title"] = f"Node {node['id']}"
node["size"] = 10 + G.degree[node['id']] * 2
node["color"] = "#00ffff" if G.nodes[node['id']]["type"] == "A" else "#ff00ff"
# Save and display
net.show("interactive_network.html")
Rich Tooltips:
# Create detailed HTML tooltip
tooltip = f"""
<div style='padding: 10px; background-color: #f5f5f5; border-radius: 5px'>
<h3>Node {node}</h3>
<p><b>Type:</b> {G.nodes[node]['type']}</p>
<p><b>Connections:</b> {G.degree(node)}</p>
</div>
"""
net.add_node(node, title=tooltip, color=color, size=size)
Advanced Network Visualization
For complex or large networks, advanced techniques improve visualization clarity.
Edge Bundling:
# Simple edge bundling function
def curve_edges(G, pos, dist_ratio=0.2):
curved_edges = []
for edge in G.edges():
# Create curved path for edge
source_pos = np.array(pos[edge[0]])
target_pos = np.array(pos[edge[1]])
midpoint = (source_pos + target_pos) / 2
# Add some curvature
normal = np.array([-midpoint[1], midpoint[0]])
normal = normal / np.linalg.norm(normal) * dist_ratio
# Create curve points
path = [source_pos, midpoint + normal, target_pos]
curved_edges.append(path)
return curved_edges
Large Network Visualization:
# For large networks, filter to show only important elements
pagerank = nx.pagerank(G)
threshold = 0.003
important_nodes = [n for n, r in pagerank.items() if r > threshold]
subgraph = G.subgraph(important_nodes)
# Size nodes by importance
node_sizes = [pagerank[n] * 30000 for n in subgraph.nodes()]
nx.draw_networkx(subgraph, pos, node_size=node_sizes, alpha=0.8)
Practical Applications and Case Studies
Network graphs apply to diverse domains:
Social Network Analysis:
# Detect communities
communities = nx.algorithms.community.greedy_modularity_communities(G)
# Analyze influence
betweenness = nx.betweenness_centrality(G)
influencers = sorted(betweenness.items(), key=lambda x: x[1], reverse=True)[:5]
print(f"Top influencers: {influencers}")
Biological Networks:
# Create protein interaction network
G = nx.Graph()
G.add_nodes_from([
("P1", {"type": "Receptor"}),
("P2", {"type": "Enzyme"}),
("P3", {"type": "Transcription Factor"})
])
G.add_edges_from([
("P1", "P2", {"effect": "Activation"}),
("P2", "P3", {"effect": "Inhibition"})
])
# Color by protein type
node_colors = ["red" if G.nodes[n]["type"] == "Receptor" else
"blue" if G.nodes[n]["type"] == "Enzyme" else
"green" for n in G.nodes()]
Transportation Networks:
# Find bottlenecks in transport network
edge_betweenness = nx.edge_betweenness_centrality(G)
critical_connections = sorted(edge_betweenness.items(),
key=lambda x: x[1], reverse=True)[:5]
Working with Directed and Undirected Graphs
Different graph types require specific handling:
# Create directed graph
D = nx.DiGraph()
D.add_edges_from([('A', 'B'), ('B', 'C'), ('A', 'C')])
# Convert to undirected
G = D.to_undirected()
# Visualize directed graph
plt.figure(figsize=(8, 6))
pos = nx.spring_layout(D)
nx.draw_networkx(D, pos,
arrowsize=15,
arrowstyle='-|>',
node_color='lightblue')
Directed graphs use in_degree
and out_degree
for connection analysis:
# Analyze influence and popularity
influence = D.out_degree()
popularity = D.in_degree()
Best Practices and Optimization
For effective network visualizations:
- Performance: For large networks, use Graph-tool or filter to show only important nodes
- Layout: Choose appropriate layouts (spring for general, circular for cycles)
- Color: Use meaningful color schemes that highlight important attributes
- Size: Size nodes and edges based on relevant metrics
- Interactivity: Use Pyvis for interactive exploration of complex networks
- Simplification: Consider edge bundling or aggregation for dense networks
- Labels: Only label important nodes to reduce visual clutter