Python Plotnine: A Beginner Guide to Stunning Data Visualization

Python Plotnine Insights Through Visualization (1)

Visualization is one of the most important parts of computation. From a very young age, we are advised to visualize coordinates on the x-y plane and to learn theoretical concepts using tools like flowcharts and diagrams. Visualization using various colors and graphs helps in speeding up the learning process. Looking at graphs and scatter plots is easier when datasets are huge and values such as outliers cannot be detected manually by going through an Excel sheet. And honestly, plotting is also kind of fun!

Python offers Plotnine for visualization purposes which is based on the Grammar of Graphics. Grammar of Graphics provides a structured approach to plots by mapping data to visual elements. This library is inspired by the R package called ggplot2. It helps us create beautiful graphs and plots with minimal code and easy implementation.

In this article, we will first look at some of the key features of the Python plotnine library and then we will implement a simple scatterplot using this library. Then we will plot a bar chart using the same library. The dataset that we will be using in this exercise would be the mtcars dataset.

Why Use Plotnine for Data Visualization in Python

Some of the key features of this library are:

  • Grammar of Graphics: This library follows the Grammar of Graphics framework, which can easily decompose plots into components such as data, aesthetics, and layers making it easier for us to customize and modify visualizations.
  • Huge range of plots: Python plotnine provides a huge range of plots such as scatter plots, line graphs, bar charts, histograms, and many more. This allows users to choose their most preferred graphs and use visualization techniques according to their needs.
  • Elegant syntax: The syntax of plotnine is easy to read and since the entire library is inspired by R’s ggplot, it is easier for users to switch between both languages without many complications.
  • Themes: Plotnine also provides in-built themes such as dark and minimal which provide visual aesthetics including colored grid lines and customized plot colors.

Suggested: How to Superimpose Scatter Plots Using Matplotlib?

Install and Import Plotnine Library in Python

Before we start coding and plotting graphs using this library, we need to install it in our system. Run the following code to install the library in your desktop/laptop.

pip install plotnine

Now, first we will plot a simple scatter plot using the ggplot() function and the dataset that we will be using is the mtcars dataset. It is a built in dataset in R that contains measurement on 11 different attributes for 32 different cars. It maps the variable weight(wt) to the x-axis and miles per gallon(mpg) to the y-axis. Then we are adding points to the plot using the geom_point() function, which represents each data point as a dot.

Mtcars Dataset
Mtcars Dataset
#importing required functions from the plotnine library
from plotnine import ggplot, aes, geom_point, theme_minimal, labs, ggtitle
from plotnine.data import mtcars

# Creating a scatter plot with themes and colors
scatter_plot_customized = (
    ggplot(mtcars, aes(x='wt', y='mpg', color='factor(cyl)')) +
    geom_point() +
    ggtitle("Scatter Plot of Weight vs. MPG") +
    labs(x="Weight (1000 lbs)", y="Miles per Gallon") +
    theme_minimal()
)

print(scatter_plot_customized)

The output is:

Scatterplot Using Plotnine
Visualize Car Weight vs Mileage Scatter Plot using Plotnine

In this next section, we will calculate the average miles per gallon(‘mpg’) using pandas for each number of cylinders (cyl). Then we will plot a bar chart using the plotnine library where cylinder (cyl) is on the x-axis and the average miles per gallon(mpg) to the y-axis. Each category of cylinder is represented as a vertical bar and add labels accordingly.

#importing required functions from the plotnine module and also the mtcars dataset
from plotnine importing ggplot, aes, geom_bar
from plotnine.data import mtcars

# Calculating average miles per gallon (mpg) for each number of cylinders (cyl)
avg_mpg = mtcars.groupby('cyl')['mpg'].mean().reset_index()

# Creating a bar plot
bar_plot = (
    ggplot(avg_mpg, aes(x='factor(cyl)', y='mpg')) +
    geom_bar(stat='identity') +
    labs(x="Number of Cylinders", y="Average MPG") +
    ggtitle("Average MPG by Number of Cylinders")
)

print(bar_plot)

The output would be:

Barplot Using Plotnine
Compare Average MPG across Car Cylinders with Plotnine Bar Chart

Recommended: How to Place the Legend Outside the Plot Using Matplotlib?

Conclusion.

In this article, we have gone through the importance of visualization and how we can gain valuable insights from data visualization. Python being a very powerful language provides a solution for all our visualization needs. The plotnine library is very useful when it comes to graphs and plots. We can easily use it to visualize data and gain insights from it. From bar charts to scatter plots, this library can be used for a variety of tasks and problem-solving. You can easily detect outliers and anomalies using this library. Happy reading!