There are tons of times when you have no idea which product is better than the rest in a particular section. Well, worry no more! Python has a solution for everything, and that is what recommendation systems are for.
Also Read: Theoretical Introduction to Recommendation Systems in Python
In this tutorial, we will be building a product recommendation system in Python programming language. Let us start off by understanding the dataset that we will be using for this tutorial.
For this tutorial, we will be using the Amazon Beauty Products Ratings Dataset which contains over 2 million customer reviews and ratings of Beauty related products sold on the website.
The dataset contains the following information about each product: UserID, which is unique to each customer and helps to identify a user; ProductID, which helps to identify a product uniquely, ratings, which range from 1 to 5, and a timestamp, which provides the time of the ratings.
We will be importing all the necessary libraries and loading the dataset into the program. Make sure your
.csv file is in the same directory as the code file to avoid any errors. Look at the code below.
import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv("ratings_Beauty.csv") print("Number of Products in the dataset : ",df.shape)
In the dataset, we can see that there are total
2023070 beauty product reviews. We will try to plot a bar graph that will display the value of the ratings against the count of the ratings.
This will help us understand how the user’s reviews are distributed among the 5 rating values, i.e., 1,2,3,4, and 5. Look at the code snippet below.
count_ratings = [0 for i in range(len(np.unique(df['Rating'])))] print("Number of Unique Ratings available : ",len(count_ratings)) for i in range(df.shape): count_ratings[int(df['Rating'][i]-1)]+=1 print("Count of each ratings is : ",count_ratings) plt.style.use('seaborn') labels = ["1 star" , "2 star", "3 star", "4 star", "5 star"] plt.figure(figsize=(15,8),facecolor="w") ax = plt.barh(labels,count_ratings, color=["yellow","cyan","pink", "skyblue","lightgreen"], edgecolor="black") for i in ax.patches: plt.text(i.get_width()+0.6, i.get_y()+0.3, str(round((i.get_width()), 4)), fontsize=15, fontweight='bold', color='grey') plt.title("Horizontal Bar Graph - Ratings vs Count",fontsize=15) plt.show()
After the code execution, the program will display the plot shown below. This plot will help us understand what users think about beauty products on Amazon.
Next, we will look at the recommendations in two ways: The first way would be to recommend the products which have the maximum 4- or 5-star ratings to the user.
Another way is to let the users know which users are making the 4- or 5-star ratings for the products which will help recognize that these are users whose ratings can be helpful as they are the most users.
Let’s filter out the data where the ratings are either 4- or 5-star ratings using the code below.
df_4 = df[df['Rating']==4.0] df_5 = df[df['Rating']==5.0] df_45 = pd.concat([df_4, df_5])
Next, we will be looking at the top products having the 4 star and 5-star rating and displaying it in form of a bar graph to know the most recommended products by the website according to the ratings
popular_products = pd.DataFrame(df_45.groupby('ProductId')['Rating'].count()) most_popular = popular_products.sort_values('Rating', ascending=False)[:10] plt.figure(figsize=(15,8),facecolor="w") most_popular.head(30).plot(kind = "barh") plt.title("Products vs Count of Purchase",fontsize=15) plt.show()
The code when executed results in the plot shown below. The plot shows that the product with the product_id: B001MA0QY2 is the most recommended product and we can see the top 10 popular products listed in the plot below.
Now let’s move on to the other approach to recommend the new users the products on the website according to the users who have rated the most on the website as they are the more frequent users of the beauty products. Look at the code and output below.
popular_users = pd.DataFrame(df_45.groupby('UserId')['Rating'].count()) most_popular_users = popular_users.sort_values('Rating', ascending=False)[:10] plt.figure(figsize=(15,8),facecolor="w") most_popular_users.head(30).plot(kind = "barh") plt.title("UserIDs vs Count of Purchase",fontsize=15) plt.show()
Have a look at the plot displayed by the code above which displays the most frequent users of the product.
In this tutorial, we learned about a product recommendation system using Python programming language.
Recommendation systems help understand what the current users of the products are liking and what interests them the most in order to help the new users understand what products they must try on.
Thank you for reading!