How to Plot Multiple Datasets on a Scatterplot?

PLOT MULTIPLE DATASETS ON SCATTERPLOT

To plot multiple datasets on a scatterplot sounds like a hard and nerve-wracking task but trust me if you have even a little bit of knowledge about plotting with matplotlib, it’s like your usual plotting.

The matplotlib is the most popularly used data visualization library that supports a number of plots for visualizing arrays. Matplotlib is mainly known for its 2D plots, it also supports 3D plots for better understanding of the data. Visualizing the data can help us to interpret, analyze and observe the trends of data which ultimately helps in storytelling because you have a grip on your data. This is especially useful for data scientists.

Read this post on data visualization using Matplotlib.

Machine Learning also needs visualization to understand the data which later helps in feature engineering and model selection. Visualization can also be used to convey the performance of a model like its accuracy, prediction rate, and so on. So if you have a good visualization library in your hand, your work will be done with more ease.

Coming to the topic at our hand, we need to figure out how to plot multiple datasets on the same scatterplot. We are going to see how to plot two datasets, three datasets, and even four datasets on the same scatterplot.

Plotting Two Datasets on a Scatterplot

We are going to take two simple datasets and plot them on a scatterplot. The code is simple and plots the two datasets with different colors to distinguish between them.

import matplotlib.pyplot as plt
x1 = [1, 2, 3, 4, 5]
y1 = [2,5,8,11,10]
x2 = [1, 2, 3, 4, 5]
y2 = [18,12,3,5,10]
fig, ax = plt.subplots()
ax.scatter(x1, y1, label='Dataset 1')
ax.scatter(x2, y2, label='Dataset 2', marker='s', color='r')
ax.legend()
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Multiple Datasets Scatter Plot')
plt.show()

In the first line, we are importing the matplotlib library.

We are creating two datasets with the data points x1,y1, and x2,y2 respectively. We are creating two objects in the next line- the figure object which gives a shape to the plot and the axis object which is the main component of the plot.

We use the scatter component to create a scatter plot for the first dataset. We do the same thing for the second dataset but specify the marker shape and color of the plot.

A legend is added which gives the details about the graph. We give labels for the axes with the help of set_label and title using set_title. The show method is used to display the plot.

Plot Multiple Datasets On A Scatterplot-Two
Plot Multiple Datasets On A Scatterplot-Two

Plotting Three Datasets on a Scatterplot

We have seen a simple method in the above example. Let us take a look at the other possible ways to define a dataset.

import matplotlib.pyplot as plt
x = range(100)
y = range(100, 200)
fig, ax = plt.subplots()
ax.scatter(x[:4], y[:4], s=10, c='b', marker="s", label='first')
ax.scatter(x[20:30], y[20:30], s=10, c='g', marker="^", label='second')
ax.scatter(x[40:], y[40:], s=10, c='r', marker="o", label='third')
ax.legend(loc='upper left')
plt.show()

As usual, we are importing the matplotlib library. We are defining two data points x and y with a range. The x data point consists of values from 0 to 100 while the y data point consists of values between 100 and 200.

We are initializing the figure and axis objects in the fourth line.

For the first dataset, we are taking the first four data points from x and y. The color is set to blue, the shape of the marker to square and the label is also specified.

Coming to the second one, we are considering the values between the indices 20 and 30 from both x and y data points. The size of the marker is set to10 , the color to green, marker shape to ^.

Lastly, we are considering the values from 40 to the extreme end of the x and y data points for the third dataset. The color is set to red, marker shape is o and the label is third.

The location of the legend is specified to be in the upper left. The show method is used to display the figure.

Plotting Three Datasets On A Scatter Plot
Plotting Three Datasets On A Scatter Plot

Plotting Four Datasets on a Scatterplot

In the last part, we are going to plot four different datasets on the plot and also see the different ways to generate random data points with the help of the numpy library.

import matplotlib.pyplot as plt
import numpy as np
x1 = np.random.rand(50)  
y1 = np.random.rand(50)
x2 = np.random.randn(50)  
y2 = np.random.randn(50)
x3 = np.random.uniform(-5, 5, 50) 
y3 = np.random.uniform(-2, 2, 50)
x4 = np.random.randint(1, 20, 50)  
y4 = np.random.randint(1, 10, 50)
fig, ax = plt.subplots()
ax.scatter(x1, y1, label='Dataset 1',marker='d')
ax.scatter(x2, y2, label='Dataset 2',marker='^')
ax.scatter(x3, y3, label='Dataset 3',marker='s')
ax.scatter(x4, y4, label='Dataset 4')
ax.legend()
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Four Datasets Scatter Plot')
plt.show()

In the fists two lines, we are importing the matplotlib and numpy library.

x1 = np.random.rand(50): The rand function of the random module from the numpy library is used to generate 50 random values between 0 and 1. These values are stored in x1. The same happens with y1.

x2 = np.random.randn(50): Generates 50 values that are random;y distributed between 0 and 1. The same method is followed in y2.

x3 = np.random.uniform(-5, 5, 50): This statement generates 50 uniform values between -5 and 5. These values are stored in the variable called x3.

y3 = np.random.uniform(-2, 2, 50): Similar to the previous statement, this one generates 50 uniform values between -2 and 2.

x4 = np.random.randint(1, 20, 50): This line generates 50 random integer values between 1 and 50. Similarly, the next line generates 50 values between 1 and 10.

These datasets are plotted on a scatter plot with the necessary requirements.

Plotting Four Datasets On A Scatter Plot
Plotting Four Datasets On A Scatter Plot

Conclusion

We have reached the end of this tutorial. To recapitulate, we have learned how matplotlib can play a key role in visualizing your data and how important that visualization is for any field.

The various plots of the matplotlib library – bar, histogram, line, scatter, and pie give you different methods of visualizing your data, even 3D.

We have learned how to plot multiple datasets on a single scatter plot. We have seen plotting two, three, and four datasets on a scatter plot. These datasets can be distinguished from each other by the shape of their markers, and the colors of the plots which are also cool options to choose the library.

You can extend this tutorial and plot five different datasets on a single scatter plot and maybe, customize the type of plot too. Happy Coding!

References

You can find the Matplotlib documentation here.

Refer to the stack overflow answer chain on the same topic here.