Welcome to the two-episode series of creating network graphs using Python and visualizing them in Gephi! In this pilot episode, we are going to learn in detail about what network graphs are, the approaches to creating them, and performing some basic operations on them.
A Network graph is used to analyze the relationship between different entities. The nodes of the graph are these entities, and the edges can become the relation between them.
To create and visualize the network graphs, we have many tools and libraries in Python supporting the network graphs.
We can use the Pyvis package, NetworkX library, and Visdcc in Dash.
In this article, we are going to use NetworkX to create network graphs.
What Is A Network Graph?
Similar to any other graph, a network has nodes and edges on it. Network graphs are used to analyze and identify the relationships between different entities. Leveraging their names, network graphs are used to analyze Facebook networks, Twitter tweets associated with particular hashtags, authors’ relationships with their co-authors, and also to analyze how many times an actor has acted in a director’s movie and so on.
It is also used to deal with complex real-world tasks like analyzing the interconnections between multiple cities, nautical connections or routes between important countries for businesses, and many more to list.
The NetworkX Library
The networkx library is one of the preferred tools for creating the network graphs. It is a package used to draw the network graphs and depends on the matplotlib library to visualize the network graphs.
Before we use the Networkx library, we need to install it in our environment. The library can be installed by the command given below.
pip install networkx
Let us see a simple example to understand the basics of Networkx.
import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() G.add_node('A') G.add_nodes_from(['B','C','D']) G.add_edge('A','B') G.add_edges_from([('B', 'C'), ('C', 'D'),('D','A'),('A','C')]) print("Nodes:", G.nodes()) print("Edges:", G.edges()) node_colors = ['red', 'blue', 'green', 'yellow'] pos = nx.spring_layout(G, seed=42) nx.draw(G, pos, with_labels=True, font_size=10, font_weight='bold', node_size=1000, node_color=node_colors) plt.show()
In the first two lines, we are importing the two libraries – Matplotlib and NetworkX.
An instance G is created for the
Graph method of the library. The
add_node is used to insert nodes in the graph. It only inserts one node in the graph. Whereas the
add_nodes_from method inserts multiple nodes at a time. It takes a list of nodes to be inserted.
add_edge is used to insert an edge between two nodes. If you want to insert multiple edges at the same time, we can use
add_edges_from .This method also takes a list.
In this code, we have four nodes
(A,B,C,D) and has 5 edges between them
We are printing the Nodes and edges to the console. We can also choose to customize the colors of the nodes. Here, we used Red, Blue, Green, and Yellow to color the nodes.
In the next two lines, we are deciding on the layout of the graph, and the seed parameter is set so that the graph remains the same every time you execute the code. The
draw function is used to draw the graph with labels and colors.
The show method of the matplotlib is used to visualize the graph.
This is just an example of the NetworkX library. Now we are going to use a real dataset to create a network graph that further can be visualized in Gephi.
Creating a Network Graph to Visualize in Gephi
Before going with the code, let us talk about the dataset. The dataset we used here is the Airports Database. It contains over 10k entries of airports, train stations, and ferry stations across the world.
It has the following labels – Airport ID, Name, City, Country, IATA, ICAO, Latitude, Longitude, Altitude, and so on.
The motto is to bring all the cities of a country that have airports or stations in or around them.
The code is given below.
import pandas as pd import networkx as nx import matplotlib.pyplot as plt csvfile = 'airports.csv' data = pd.read_csv(csvfile) data_subset = data.head(50) edges = data_subset[['City', 'Country']] G = nx.Graph() G.add_edges_from(edges.values) pos = nx.spring_layout(G, seed=42) nx.draw(G, pos, with_labels=True, font_size=10, font_weight='bold', node_size=1000) plt.show()
We have imported the three libraries- NetworkX, Matplotlib, and Pandas.
The CSV file (airports.csv) is saved in a variable called csvfile. The data is then read and stored in another variable called data. Since the dataset has many rows, it might be computationally expensive to visualize the entire dataset using the matplotlib library. Hence, we just use the first 50 rows to get a glimpse of the dataset. These 50 rows of the file are stored in data_subset. The edges of the graph are taken from the dataset(City and Country). We are calling the Graph method with the help of an instance G. Edges are added in between the nodes, and the graph is drawn with the help of
draw. Finally, the show method is used to visualize the data.
As you can see from the graph, the first 50 rows of the dataset have four countries( Greenland, Iceland, Canada, and New Guinea).
Each city from these countries that have airports or train and ferry stations is connected to the city with the help of edges.
We have come to the end of the first part! Here, we discussed the basics of Network Graphs and some of their applications. We discussed the tool used to create a network graph in Python. The Networkx library is most frequently used to create network graphs in Python. To start off, we used an example to understand the basics of networkx and how we can create a simple network graph,
We took a real-time dataset to create a network graph.
Coming up in the second part is how to save the network graph we just created to visualize in Gephi.