This is the second article on creating and visualizing network graphs using Netwokrx and Gephi. In this article, we are going to take a network graph and store it in such a way that we can visualize it using Gephi.
Gephi is a visualization software mainly used to visualize and interact with network graphs. It was developed in Java and can support graphs drawn by Python too.
In this post, we are going to see what Gephi is, its installation, and how to visualize network graphs using Gephi.
What Is Gephi?
Gephi is widely used to visualize network graphs whose edges and nodes can be saved and imported into the Gephi environment. If you wish to visualize network graphs drawn using other languages or tools, Gephi also supports that. Originally written in Java, it can visualize graphs from Python, R, and other languages. Not only visualizing, but Gephi also allows us to interact with the graphs, change the course of the graph, customize the colors, delete certain nodes, and much more!
If you haven’t already read the first part, please read the first article here, where we discuss what network graphs are, the approaches to creating them, and performing some basic operations on them.
Since Gephi is developed in Java and is dependent on it, we need to first make sure we have Java installed in our systems.Java is preinstalled in our systems, but if it is not, you can install it from the official website of Java
Gephi can be installed from here.
To give you a glimpse of how to use Gephi, let us take a simple example and visualize it in Gephi.
import networkx as nx G = nx.Graph() G.add_node('A') G.add_nodes_from(['B', 'C', 'D']) G.add_edge('A', 'B') G.add_edges_from([('B', 'C'), ('C', 'D'), ('D', 'A'), ('A', 'C')]) nx.write_graphml(G, "test.graphml")
We have imported the networkx library in the first line. The
Graph method of this library is called with the help of an instance variable called G. We have added the nodes and edges of the graph.
We have another method called
nx.write_graphml which saves the network graph in a file format supported by Gephi. We are saving the network graph as
The network graph looks like this in the graphml format.
Now open gephi software and click on the New Project on the pop-up that appears.
Go to Data Laboratory and click on Import Spreadsheet. Click on Overview and you will now be able to see the graph.
Here is a video to walk through the process.
Insights from the video
- We can change the size of the nodes
- We can view the labels of the nodes and change the way they appear – in this example, we selected the
Hide Not Selectedcheckbox, which only shows the labels of the nodes when we hover over them
- We can zoom in and zoom out of the graph
Let us now look at the possible ways to save the network graph to be able to visualize it in Gephi
We have already looked at one approach –
graphml .Let us see the other file format supported by Gephi.
How to Store the Network Graph to Visualize in Gephi – gexf
gexf stands for Graph Exchange XML Format and is widely used to transport and store network graphs. It is also a part of the NetworkX library and is mostly used to transport the network graphs to Gephi. The implementation of gexf is the same as that of graphml. We just need to include
nx.gexf to save it in gexf format.
Let us use the example from above and store it in Gexf.
import networkx as nx import matplotlib.pyplot as plt G = nx.Graph() G.add_node('A') G.add_nodes_from(['B', 'C', 'D']) G.add_edge('A', 'B') G.add_edges_from([('B', 'C'), ('C', 'D'), ('D', 'A'), ('A', 'C')]) nx.write_gexf(G, "test.gexf")
Follow the same process in the video to visualize this network graph in Gephi.
I hope you have understood the basic visualization functions of Gephi. Let us now explore Gephi with larger network graphs.
For this example, we are going to take the same dataset we used in the previous article – Airports Database 2017.
Visualizing Airports Database Using Gephi
If you remember, we plotted the network graph of this dataset using the matplotlib library in the previous article. We only selected the first 50 entries out of the 10000 entries to make it simple.
Here, we are going to consider the entire dataset for visualization and also look at a few filtering options to manage huge network graphs.
The code to export the network graph to Gephi is given below.
import pandas as pd import networkx as nx csv_file = 'airports.csv' data = pd.read_csv(csv_file) edges = data[['City', 'Country']] G = nx.Graph() G.add_edges_from(edges.values) nx.write_gexf(G, 'airport.gexf')
The dataset we took is pretty cluttered since all the cities that have an airport in a country are connected to the country. So it becomes a pretty huge network. We apply a few filters, such as
Giant Component which filters out all the huge ones.
Follow the video below.
Giant Component filter is used to filter all the larger components of the network.
The layout used in this network analysis is the Yifan Hu layout, as it can handle a huge number of nodes and edges.
We are also performing some statistics (Modularity). Modularity is used to identify the groups or clusters that have dense connections between themselves and lesser connections with other groups. Based on this modularity ranking, we color-coded the clusters.
The outliers of the graph( nodes at the end that are not connected to any other nodes and are left alone) are the cities that do not have an airport or which do not have a connection to any country.
We have also set the labels of the nodes visible only when we hover on that particular node and also changed the color of the label.
That’s the end of it! We have discussed what Gephi is and its installation. We took a simple example of a network graph and tried to visualize it in Gephi. In order to do so, we have looked at two approaches two storing the network graph in a file format compatible with Gephi. These file formats are – Graphml and Gexf.
Then we visualized the Airports Database, which has over 10k rows of all the airports, train, and ferry stations in different cities of a country. We tried to plot all the cities of a country on the graph. Since it is a cluttered network, we filtered through the nodes and performed some statistics.
This is a basic idea of the Gephi software. Go ahead and perform analysis on your own with Gephi!