Data visualization is a way of representing data in patterns, objects, and elements. ggplot in python is used for data visualization. Python language provides various libraries to operate data effectively. This article covers the details of the ggplot package in Python.
For example, different websites, social media platforms, shopping sites, and food delivery websites collect lots of information related to the millions of users on daily basis. Have you ever wondered how they manage to visualize this data? specifically, data can be visualized in any form like a graph, bar plots, histograms, or scatter plots. These data visualization techniques are very useful in data-related fields. To understand the working of the ggplot library first, we need to know what is data visualization.
What is Data Visualization?
The representation of data in the form of graphs, structures, and elements is considered a data visualization. Data can be represented in various forms like graphs, charts, and maps. Representing data in the form of plots helps us to understand the pattern and trends. In different fields like data science, big data, and machine learning, decision-making depends on the data. In this case, data visualization plays a huge role.
Why We Need Data Visualization?
Prediction using data is in demand nowadays. Different domains like Machine learning, Deep learning, Data science, and Big data use large amounts of data for the prediction and analysis of their models. For this purpose, data visualization plays a vital role. A colorful representation of data always helps us to visualize data quickly. This is possible due to data visualization. Trends and patterns help different models predict the results. The detailed implementation is given in the later part of the article. Before that, we need to understand the basic concept behind data visualization i.e. Grammar of Graphics.
Grammar of Graphics: Basic of Data Visualization
In a language like English, we write a sentence that is a combination of different things like grammar, pronouns, verbs, and tenses. Likewise, data visualization in Python becomes easy to understand and use when we know the grammar of graphics.
The grammar of graphics is made up of different layers. These layers help to form a visualization in a meaningful way. If we refer to these layers to perform visualization, then we can represent our data more accurately. Let’s see the layers and structure of the grammar of graphics.
Structure of Layers: Grammar of Graphics
The 7 layers of the grammar of graphics are Data, Aesthetics, Geometric objects, Facets, Statistical transformation, Coordinates, and Theme.
The first and very basic layer is the Data layer. For any data visualization, we need some data. The second layer is the Aesthetic layer. The aesthetic layer represents the different components of any plot like the x-axis and y-axis, the size of different lines, fills, and colors used in plots. This aesthetic layer is more about the representation of the plots. The geometric objects layer is about the type of plot we are using to represent our data. For example, Scatter plots, bar plots, histograms, box plots, or lines.
The other 4 layers are optional, but they make plots more meaningful and representative. The fourth layer is the facet, which is used to plot the subplots. For example, we can divide our data into subgroups and plot them separately. The fifth layer, Statistical transformation is necessary when there are different types of data present in the dataset. The sixth layer is Coordinates which are used to define the coordinates in the 2d plots. The last layer is the Theme, which represents the attributes of the data. For example, font and color.
Let’s implement some plots using this grammar of graphics.
Ggplot in Python
There are different libraries in Python for data visualization. You can learn more about the data visualization libraries here. Plotnine is a library based on ggplot2 and helps implement the plots in Python. To implement the plots, we need to import plotnine library and then import ggplot package. To understand the working, let’s implement some examples.
Example of Plotting Data with Ggplot
For plotting data using ggplot, we need a dataset. In this example, we use an economics dataset from the plotnine library. Let’s start the implementation.
Example 1: Installing Plotnine
pip install plotnine
First, we must install the plotnine library in Python to use ggplot. For this, the pip install plotnine command is used. Use this command on the terminal to install the package.
After completing the package installation process, we can proceed with the data visualization.
Example 2: Implementation of Line Plot
The line plot is implemented using the ‘geom_line()’ function. The ‘geom_line()’ function plots the line plot. Let’s see the implementation.
from plotnine.data import economics from plotnine import ggplot, aes, geom_line ( ggplot(economics) + aes(x="date", y="pop") + geom_line() )
In this example 2, first, import the plotnine package and economics dataset along with that also, import the ggplot, aes, and geom_line packages. The ggplot function is used to use the economics dataset, an aesthetic function is used as an x-axis, and the y-axis includes the two columns from a dataset, and the geom_line is used to plot the line plot.
Example 3: Implementation of Histogram Plot
The ‘geom_histogram()’ function implements the histogram plot using the given dataset. Let’s implement the code.
from plotnine.data import economics from plotnine import ggplot, aes, geom_histogram ( ggplot(economics) + aes(x="date", y="pop") + geom_histogram() )
The ‘geom_histogram()’ function is used to implement the histogram plot for data of the ‘unemploy’ column in a dataset. Every time we need to import geometric objects from the plotnine library. Here, we are importing the geom_histogram package.
Example 4: Implementation of Box Plot
The ‘geom_boxplot()’ function implements the box plot using the given dataset. Let’s implement the code.
from plotnine.data import economics from plotnine import ggplot, aes, geom_boxplot ( ggplot(economics) + aes(x="pce", y="unemploy") + geom_boxplot() )
In this code, the data used in the x-axis and y-axis are ‘pce’ and ‘unemploy’. The ‘geom_boxplot’ is imported using the plotnine package.
Example 5: Implementation of Dataset Using Facets
The facet is a layer from the grammar of graphics let’s implement this layer with other layers like aes, geom, and dataset. The dataset used in this code can be downloaded from here.
import pandas as pd from plotnine import ggplot, aes, facet_grid, labs, geom_col df = pd.read_csv("tips.csv") ( ggplot(df) + facet_grid(facets="~sex") + aes(x="time", y="total_bill") + labs( x="day", y="total_bill", ) + geom_col() )
In this example 4, the code imports pandas and plotnine library with ggplot, aes, facet_grid, and geom_col. The ‘facet_grid()’ function is used where the facets attribute contains the data from the dataset. Here, the female and male category is divided according to the ‘day’ and ‘total_bill.’
Example 6: Implementation of Dataset Using Statistical Transformations
Statistical transformation is used to scale down the data from the dataset. In this example, we have implemented this using ‘bins’ attribute.
import pandas as pd from plotnine import ggplot, aes, geom_histogram df = pd.read_csv("tips.csv") ggplot(df) + aes(x="total_bill") + geom_histogram(bins=10)
In this example, 5, the ‘total_bill’ is scaled down to 10 from the tips dataset. Here, we have implemented it in the form of a histogram plot. The ‘geom_histogram()’ function is used in this example.
Example 7: Implementation of Dataset Using Coordinate system
The coordinate system is another layer of the grammar of graphics which is used to flip the coordinate using the ‘coord_flip()’ function. Let’s implement the code.
import pandas as pd from plotnine import ggplot, aes, geom_histogram, coord_flip df = pd.read_csv("tips.csv") ( ggplot(df) + aes(x="total_bill") + geom_histogram(bins=10) + coord_flip() )
In this code, both statistical transformation and coordinate layer is used. The total_bill data is scaled down to 10, and the coordinates are flipped using the coord_flip function. The histogram plot is implemented using the ‘geom_histogram()’ function.
Example 8: Implementation of Dataset Using Themes
‘theme_xkcd()’ function is used to change the theme of a plot. Let’s implement the code.
import pandas as pd from plotnine import ggplot, aes, facet_grid, labs, geom_col, theme_xkcd df = pd.read_csv("tips.csv") ( ggplot(df) + facet_grid(facets="~sex") + aes(x="time", y="total_bill") + labs( x="time", y="total_bill", ) + geom_col() + theme_xkcd() )
In this example, the theme_xkcd() function changes the theme. The facets are also implemented using the facet_grid function() . ‘lab’ is also imported to mention the labels in the plots.
In this article, detailed information on the ggplot package from the plotnine library is given. The main concept of data visualization is to represent the data in a meaningful way. The theory behind the data visualization and plotnine library is the grammar of graphics. The different layers of the grammar of graphics are used to implement any plot in the way we want. Refer to this article to use every layer and implement the code using the plotnine library with the ggplot package in Python. Hope you enjoy this article.