Welcome to this tutorial on sentiment analysis using Python. As we are all aware that human sentiments are often displayed in the form of facial expression, verbal communication, or even written dialects or comments. Let’s look at how this can be predicted using Python.
Introduction to Sentiment Analysis using Python
With the trend in Machine Learning, different techniques have been applied to data to make predictions similar to the human brain.
The elaboration of these tasks of Artificial Intelligence brings us into the depths of Deep Learning and Natural Language Processing.
Sentiment Analysis is a Natural Language Processing Technique.
What is Natural Language Processing?
Natural Language Processing (NLP) is a subset of Artificial Intelligence where the machine is trained to analyze textual data. Sentiment Analysis is an NLP technique to predict the sentiment of the writer. By sentiment, we generally mean – positive, negative, or neutral.
NLP is a vast domain and the task of the sentiment detection can be done using the in-built libraries such as NLTK (Natural Language Tool Kit) and various other libraries.
Cleaning the Text for Parsing and Processing
Any textual data in its raw form cannot be analyzed by NLP Libraries. This data needs to be cleaned using various techniques of data processing such as:
- Eliminate HTML Tags: Unstructured text contains a lot of noise and hence we need to remove the HTML tags if any.
- Eliminate Accented characters: As NLP mainly works for the English language, the presence of accented characters makes no sense and hence needs to be removed.
- Expand Contractions: Syllables are often used in Spoken English and hence it is necessary to expand them to original form.
- Eliminate Special Characters: Any non-alphanumeric characters in the text need to be removed.
- Lemmatization/Stemming: It is necessary to arrive at the base form of the words, ie swimming’s base form is swim.
- Remove Stop Words: The stop words such as articles, conjunctions, and prepositions need to be removed.
After all of the above processes, our text often referred to as corpus in NLP terminology is passed to our sentiment analysis model.
I have placed some example sentences that are converted after the above process.
Before: Terrible airport with arrogant staff and poor signage. After: terrible airport arrogant staff poor signage
Before: The airport is huge and has almost all the facilities making the transit smooth. After: airport huge almost facility make transit smooth
Before: The display told me that desks 59-62 were for Silkair, but in reality it was from 52-55. After: display tell desk 59 62 silkair reality 52 55
We will use the pre-processed sentences above in our sentiment analysis model below.
Performing Sentiment Analysis using Python
We will first code it using Python then pass examples to check results. We will use the TextBlob library to perform the sentiment analysis.
In the function defined below, text corpus is passed into the function and then TextBlob object is created and stored into the analysis object.
The text when passed through the
TextBlob() attains some properties such as sentiment containing polarity. These polarity values are then checked.
If the polarity is greater than 0, the sentiment is
positive, if it is equal to 0, it is
neutral and if it is lesser than 0, the sentiment is
from textblob import TextBlob def get_tweet_sentiment(text): analysis = TextBlob(textt) if analysis.sentiment.polarity > 0: return 'positive' elif analysis.sentiment.polarity == 0: return 'neutral' else: return 'negative'
The output of our example statements would be as follows:
Input corpus: terrible airport arrogant staff poor signage Sentiment: negative
Input corpus: display tell desk 59 62 silkair reality 52 55 Sentiment: neutral
Input corpus: airport huge almost facility make transit smooth Sentiment: positive
from textblob import TextBlob def get_tweet_sentiment(text): analysis = TextBlob(textt) if analysis.sentiment.polarity > 0: return 'positive' elif analysis.sentiment.polarity == 0: return 'neutral' else: return 'negative' print(get_tweet_sentiment(<your text>))
Drawbacks of our model
Our sentimental analysis model cannot predict the sentiments of any sarcastic comments. In fact it is not in the scope of NLP (as of today), to predict sarcastic tweets.
I hope this article has given some insights on the sentiment analysis of a text using Natural Language Processing. Do try out your own statements and let us know what your feedback in the comments section.