Hey there! Today, we will create a fake news detector in Python using some common Machine Learning Algorithms.
1. Importing Modules
Just like any other project the first step of this project is importing modules as well. We’re working with Numpy, Pandas, and itertools. The code for the same is shown below.
import numpy as np import pandas as pd import itertools from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.metrics import accuracy_score, confusion_matrix
2. Loading data
Now, let’s read the data from the csv file for the fake news detection which can be found here. The code for the same along with printing the first 5 rows of the data is shown below.
Make sure the CSV file is kept inside the same folder as the Python code. Next, let’s extract labels from the data we just loaded and print the first five labels.
3. Creating training and testing data
Before we pass our data into the final model/classifier, we need to split the data into testing and training data which is done in the code mentioned below.
x_train,x_test,y_train,y_test=train_test_split(data['text'], lb, test_size=0.2, random_state=7)
To split the data we would be using
80-20 rule where 80% of the data goes to training and the remaining 20% goes for testing data.
4. Implementing Tfidf-Vectorizer and PassiveAggressiveClassifier
A text array is converted to a
TF-IDF matrix by the use of Tfidf-Vectorizer.
- TF (Term Frequency): It is defined as the number of times a word appears in a text.
- IDF (Inverse Document Frequency): It is a measure of how significant a term is in the entire data.
Later on, we apply
PassiveAggressiveClassifier and fit data into the training data. The classifier updates the loss after each iteration and makes a slight change in the weight vector as well.
Lastly, we make the predictions about testing data and calculate the accuracy of the model over the testing data. It turns out that we receive an accuracy of over 90% on the testing data.
The code for the same is shown below.
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7) tfidf_train=tfidf_vectorizer.fit_transform(x_train) tfidf_test=tfidf_vectorizer.transform(x_test) pac=PassiveAggressiveClassifier(max_iter=50) pac.fit(tfidf_train,y_train) y_pred=pac.predict(tfidf_test) score=accuracy_score(y_test,y_pred) print("Accuracy: ",round(score*100,2),"%")
Today, we learned to detect fake news with Python over a dataset with a lot of news data. The detection was done with the help of a TfidfVectorizer and a PassiveAggressiveClassifier. And as a result we acquired an accuracy of over 90% which is amazing!
I Hope you liked the fake news detector! Keep reading to learn more!