Data Visualization using matplotlib.pyplot.scatter in Python

Scatter 1

An important methodology for any kind of Data Analysis is to observe relationships between key features and also to see if they somehow depend upon each other. Visualizing those relationships through some kind of plot or figures is even more useful. Let’s say, for example, we have a use case where we need to see some kind of trend in our data. We certainly need some kind of tool to work through it.

Matplotlib is a comprehensive library to create static, animated, and interactive visualizations in Python. It helps us to create interactive plots, figures, and layouts that can be greatly customized as per our needs.

Also read: Resize the Plots and Subplots in Matplotlib Using figsize

The scatter() method

Scatter plots are what we will be going through in this article, specifically the matplotlib.pyplot.scatter method. It is used to create scatter plots to observe relationships between features or variables which may help us gain insights.

The syntax for using this tool is really simple and requires just a few lines of code with certain parameters. Let’s go through the syntax first and then we will see how to use the most commonly used parameters to get some nice visualizations.

Syntax of the Scatter Method

matplotlib.pyplot.scatter(x_axis_array_data, y_axis_array_data, 
                                        s=None, c=None, marker=None, 
                                        cmap=None,  alpha=None, 
                                        linewidths=None, edgecolors=None)
  • x_axis_array_data: This is the x-axis data. This is the array containing data for the x-axis.
  • y_axis_array_data: This is the y-axis data. This is the array containing data for the y-axis.
  • s: This parameter is used to set the size of the data points.
  • c: This parameter is used to set the colour of the data points.
  • marker: This parameter is used to set the marker style of the data points.
  • cmap: This parameter is used to set the colour map of the data points.
  • alpha: This parameter is used to set the transparency of the data points.
  • linewidths: This parameter is used to set the width of the lines connecting the data points.
  • edgecolors: This parameter is used to set the color of the lines connecting the data points.

Modifying Scatter Plot Parameters To Create Visualizations With PyPlot Scatter

You can install matplotlib using the command:

!pip install matplotlib

Alternatively, you can install it using Anaconda.

The x_axis_array_data & y_axis_array_data

All the parameters mentioned above are optional except the x_axis_array_data and y_axis_array_data, which, as their name suggests takes in two sets of values as an array. Most commonly, NumPy arrays are used for the code to run more efficiently, shape (n, ), required.

For example – We have a dataset with the features, number_of_ratings for a video post on some social media, and we have a ratings_value which varies from 1 – 9. We want find the rating trend from the viewers. Let’s try to make some plots and try to visualize the trend.

# Basic scatter plot
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5,
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value)
plt.show()
Basic Scatter Plot
Basic Scatter Plot

The size parameter

s – refers to the marker size for a data point. It can be a float or array-like, shape (n, ), optional

# Scatter plot with one specific size for all the markers: s parameter
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = 120)
plt.show()
Scatter Plot With Specific Size Marker
Scatter Plot With Specific Size Marker
# Providing different sizes for each marker: As an array
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes)
plt.show()
Scatter Plot With Multiple Sized Marker
Scatter Plot With Multiple Sized Marker

The colour parameter

c – array-like or list of colours or colour, optional. We can use a single colour or even a colour code HEX value to get some really good-looking plots.

# Using "c" parameter: with a specific color
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])


sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])


plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes, c = "green")
plt.show()
Scatter Plot With C Parameter
Scatter Plot With C Parameter

The marker parameter

marker – refers to the marker style, (default: ‘o’)

# Using a different marker: (default: 'o')
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])


sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])



plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes, c = "green", marker = "^" )
plt.show()
Scatter Plot Marker Parameter
Scatter Plot with Marker Parameter

The colourmap parameter

cmap – A Colormap instance or registered colourmap name. cmap is only used if c is an array of floats, (default: ‘viridis’). Each float value in our colours array represents different colour intensities to plot our data.

The Matplotlib module has a number of available colourmaps.

A colourmap is like a list of colours, where each colour has a value that ranges from 0 to 100.

Here is an example of a colourmap:

Img Colorbar
Img of Colorbar
# Using cmap parameter: (Default: 'viridis')
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])


sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])

colors = np.asarray([1, 2, 5, 4, 6, 8, 6, 3, 5, 
                                4, 3, 6, 9, 2, 1, 6, 8, 8, 4, 5])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes, c = colors, cmap = "viridis" )
plt.show()
Scatter Plot With Cmap Parameter
Scatter Plot With Cmap Parameter

The alpha parameter

alpha – refers to the transparency intensity of the generated markers, ranging from 0 to 1. We are also using the cmap value as “Greens” to get a better overview of our alpha parameter

# Using alpha parameter
import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])


sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])

colors = np.asarray([1, 2, 5, 4, 6, 8, 6, 3, 5, 
                                4, 3, 6, 9, 2, 1, 6, 8, 8, 4, 5])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes, c = colors, cmap = "Greens",
           alpha = 0.75)
plt.show()
Scatter Plot With Alpha Parameter
Scatter Plot With Alpha Parameter

The linewidths parameter

linewidths- refers to the width of the edges of the marker and edgecolors – refers to a colour or sequence of colours for markers

# Using linewidths: (Default: 1.5)
# Using edgecolors

import matplotlib.pyplot as plt
import numpy as np

ratings_value = np.asarray([2, 4, 5, 6, 8, 5, 2, 8, 5, 
                            3, 2, 8, 6, 5, 4, 7, 8, 9, 7, 1])
number_of_ratings = np.asarray([10, 24, 17, 45, 23, 32, 67, 
                                34, 54, 54, 32, 67, 35, 23, 14, 16, 28, 32, 29, 28])


sizes = np.asarray([100, 240, 170, 450, 230, 320, 670, 340, 540, 
                                540, 320, 670, 350, 230, 140, 160, 280, 320, 290, 280])

colors = np.asarray([1, 2, 5, 4, 6, 8, 6, 3, 5, 
                                4, 3, 6, 9, 2, 1, 6, 8, 8, 4, 5])

plt.title("Ratings Trend Visualization")
plt.xlabel("Number of ratings")
plt.ylabel("Ratings value")

plt.scatter(x = number_of_ratings, y = ratings_value, s = sizes, c = colors, cmap = "Greens",
           alpha = 0.75, linewidths = 1, edgecolors = "Black")
plt.show()
Scatter Plot With Edgecolors And Linewidths
Scatter Plot With Edgecolors And Linewidths

Conclusion

In this article, we went through one of the most commonly used methods for data visualization in python. With the help of multiple plots, we also saw various ways to present our data which can be used in various combinations to get some great overviews regarding the data. Scatter plots are used widely across the python community, and matplotlib provides just the kind of tool to plot our data in a very easy and intuitive way.

Additional References