Gephi- How to Visualize Powerful Network Graphs From Python?

Storing network graphs

This is the second article on creating and visualizing network graphs using Netwokrx and Gephi. In this article, we are going to take a network graph and store it in such a way that we can visualize it using Gephi.

Coming to visualization, refer to this article on visualization using matplotlib

Gephi is a visualization software mainly used to visualize and interact with network graphs. It was developed in Java and can support graphs drawn by Python too.

In this post, we are going to see what Gephi is, its installation, and how to visualize network graphs using Gephi.

What Is Gephi?

Gephi is widely used to visualize network graphs whose edges and nodes can be saved and imported into the Gephi environment. If you wish to visualize network graphs drawn using other languages or tools, Gephi also supports that. Originally written in Java, it can visualize graphs from Python, R, and other languages. Not only visualizing, but Gephi also allows us to interact with the graphs, change the course of the graph, customize the colors, delete certain nodes, and much more!

If you haven’t already read the first part, please read the first article here, where we discuss what network graphs are, the approaches to creating them, and performing some basic operations on them.

Installing Gephi

Since Gephi is developed in Java and is dependent on it, we need to first make sure we have Java installed in our systems.Java is preinstalled in our systems, but if it is not, you can install it from the official website of Java

Gephi can be installed from here.

To give you a glimpse of how to use Gephi, let us take a simple example and visualize it in Gephi.

import networkx as nx
G = nx.Graph()
G.add_node('A')
G.add_nodes_from(['B', 'C', 'D'])
G.add_edge('A', 'B')
G.add_edges_from([('B', 'C'), ('C', 'D'), ('D', 'A'), ('A', 'C')])
nx.write_graphml(G, "test.graphml")

We have imported the networkx library in the first line. The Graph method of this library is called with the help of an instance variable called G. We have added the nodes and edges of the graph.

We have another method called nx.write_graphml which saves the network graph in a file format supported by Gephi. We are saving the network graph as test.graphml.

The network graph looks like this in the graphml format.

Graphml Format
Graphml Format

Now open gephi software and click on the New Project on the pop-up that appears.

Go to Data Laboratory and click on Import Spreadsheet. Click on Overview and you will now be able to see the graph.

Here is a video to walk through the process.

GephiTest

Insights from the video

  • We can change the size of the nodes
  • We can view the labels of the nodes and change the way they appear – in this example, we selected the Hide Not Selected checkbox, which only shows the labels of the nodes when we hover over them
  • We can zoom in and zoom out of the graph

Let us now look at the possible ways to save the network graph to be able to visualize it in Gephi

We have already looked at one approach – graphml .Let us see the other file format supported by Gephi.

How to Store the Network Graph to Visualize in Gephi – gexf

gexf stands for Graph Exchange XML Format and is widely used to transport and store network graphs. It is also a part of the NetworkX library and is mostly used to transport the network graphs to Gephi. The implementation of gexf is the same as that of graphml. We just need to include nx.gexf to save it in gexf format.

Let us use the example from above and store it in Gexf.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node('A')
G.add_nodes_from(['B', 'C', 'D'])
G.add_edge('A', 'B')
G.add_edges_from([('B', 'C'), ('C', 'D'), ('D', 'A'), ('A', 'C')])
nx.write_gexf(G, "test.gexf")
gexf
gexf

Follow the same process in the video to visualize this network graph in Gephi.

Visualization in Gephi
Visualization in Gephi

I hope you have understood the basic visualization functions of Gephi. Let us now explore Gephi with larger network graphs.

For this example, we are going to take the same dataset we used in the previous article – Airports Database 2017.

Visualizing Airports Database Using Gephi

If you remember, we plotted the network graph of this dataset using the matplotlib library in the previous article. We only selected the first 50 entries out of the 10000 entries to make it simple.

Here, we are going to consider the entire dataset for visualization and also look at a few filtering options to manage huge network graphs.

The code to export the network graph to Gephi is given below.

import pandas as pd
import networkx as nx
csv_file = 'airports.csv'
data = pd.read_csv(csv_file)
edges = data[['City', 'Country']]
G = nx.Graph()
G.add_edges_from(edges.values)
nx.write_gexf(G, 'airport.gexf')

The dataset we took is pretty cluttered since all the cities that have an airport in a country are connected to the country. So it becomes a pretty huge network. We apply a few filters, such as Giant Component which filters out all the huge ones.

Follow the video below.

The Giant Component filter is used to filter all the larger components of the network.

The layout used in this network analysis is the Yifan Hu layout, as it can handle a huge number of nodes and edges.

We are also performing some statistics (Modularity). Modularity is used to identify the groups or clusters that have dense connections between themselves and lesser connections with other groups. Based on this modularity ranking, we color-coded the clusters.

The outliers of the graph( nodes at the end that are not connected to any other nodes and are left alone) are the cities that do not have an airport or which do not have a connection to any country.

We have also set the labels of the nodes visible only when we hover on that particular node and also changed the color of the label.

Airport Analysis Network Graph
Airport Analysis Network Graph

Conclusion

That’s the end of it! We have discussed what Gephi is and its installation. We took a simple example of a network graph and tried to visualize it in Gephi. In order to do so, we have looked at two approaches two storing the network graph in a file format compatible with Gephi. These file formats are – Graphml and Gexf.

Then we visualized the Airports Database, which has over 10k rows of all the airports, train, and ferry stations in different cities of a country. We tried to plot all the cities of a country on the graph. Since it is a cluttered network, we filtered through the nodes and performed some statistics.

This is a basic idea of the Gephi software. Go ahead and perform analysis on your own with Gephi!

References

Gephi Documentation

GraphML documentation

Gexf documentation