How to Perform Twitter Sentiment Analysis with Python NLTK: A Guide to Natural Language Processing

Table of Contents

This article demonstrates how to perform sentiment analysis on Twitter tweets using Python and the Natural Language Toolkit (NLTK).

Sentiment Analysis involves examining the sentiment expressed in a text or document and classifying it into specific categories, such as positive or negative. Essentially, it assigns a sentiment label to the text, which can be categorized as either positive or negative. Additional categories, such as neutral, highly positive, or highly negative, can also be included.

Sentiment Analysis, also known as Opinion Mining, is commonly applied to social media data and customer reviews.

Table of Contents

Supervised Classification
Tokenize Tweets
Cleaning Tweet
Feature Extraction
Create Train and Test Set
Training Classifier and Calculating Accuracy
Testing Classifier with Custom Tweet
Precision, Recall & F1-Score
Confusion Matrix

Supervised Classification

In this article, we will focus on supervised text classification, where the classifier is trained using labeled data.

We will utilize the twitter_samples corpus from NLTK as our labeled training dataset. This corpus consists of 2,000 movie reviews with predefined sentiment polarity, as compiled by Pang and Lee.

Our classification task involves two categories: positive and negative. The twitter_samples corpus already categorizes the tweets into these two sentiment classes.

The twitter_samples corpus includes three files:

negative_tweets.json: Contains 5,000 negative tweets.
positive_tweets.json: Contains 5,000 positive tweets.
tweets.20150430-223406.json: Contains 20,000 tweets, both positive and negative.

from nltk.corpus import twitter_samples
print (twitter_samples.fileids())
'''
Output:

['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json']
'''

pos_tweets = twitter_samples.strings('positive_tweets.json')
print (len(pos_tweets)) # Output: 5000

neg_tweets = twitter_samples.strings('negative_tweets.json')
print (len(neg_tweets)) # Output: 5000

all_tweets = twitter_samples.strings('tweets.20150430-223406.json')
print (len(all_tweets)) # Output: 20000

for tweet in pos_tweets[:5]:
    print (tweet)
'''
Output:

#FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members in my community this week :)
@Lamb2ja Hey James! How odd :/ Please call our Contact Centre on 02392441234 and we will be able to assist you :) Many thanks!
@DespiteOfficial we had a listen last night :) As You Bleed is an amazing track. When are you in Scotland?!
@97sides CONGRATS :)
yeaaaah yippppy!!!  my accnt verified rqst has succeed got a blue tick mark on my fb profile :) in 15 days
'''

Tokenize Tweets

NLTK provides a TweetTokenizer module that efficiently tokenizes tweets by splitting them into a list of individual words.

When initializing the TweetTokenizer class, you can specify three parameters:

preserve_case: When set to False, the tokenizer converts the tweet to lowercase. If True, it keeps the original capitalization.
strip_handles: If set to True, the tokenizer removes Twitter handles from the tweet. If False, it keeps the handles in the text.
reduce_len: When set to True, the tokenizer shortens elongated words like "hurrayyyy" and "yipppiieeee." If False, it maintains the original length of the words.

from nltk.tokenize import TweetTokenizer
tweet_tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)

for tweet in pos_tweets[:5]:
    print (tweet_tokenizer.tokenize(tweet))
'''
Output:

['#followfriday', 'for', 'being', 'top', 'engaged', 'members', 'in', 'my', 'community', 'this', 'week', ':)']
['hey', 'james', '!', 'how', 'odd', ':/', 'please', 'call', 'our', 'contact', 'centre', 'on', '02392441234', 'and', 'we', 'will', 'be', 'able', 'to', 'assist', 'you', ':)', 'many', 'thanks', '!']
['we', 'had', 'a', 'listen', 'last', 'night', ':)', 'as', 'you', 'bleed', 'is', 'an', 'amazing', 'track', '.', 'when', 'are', 'you', 'in', 'scotland', '?', '!']
['congrats', ':)']
['yeaaah', 'yipppy', '!', '!', '!', 'my', 'accnt', 'verified', 'rqst', 'has', 'succeed', 'got', 'a', 'blue', 'tick', 'mark', 'on', 'my', 'fb', 'profile', ':)', 'in', '15', 'days']
'''

Cleaning Tweet

During the tweet cleaning process, we will perform the following steps:

Eliminate stock market tickers, such as $GE.
Remove retweet indicators like RT.
Delete hyperlinks.
Strip out hashtags, retaining only the hashtag symbol (#), not the associated words.
Discard common stop words such as "a," "and," "the," "is," "are," etc.
Remove emoticons like :), :D, :(', :-), etc.
Remove punctuation marks including periods, commas, exclamation points, etc.
Reduce words to their base or root forms using the Porter Stemming Algorithm. For example, words like "working," "works," and "worked" will be simplified to the base word "work."

We will implement a function called clean_tweets that returns a list of words from a given tweet after removing the specified elements.

import string
import re
        
from nltk.corpus import stopwords 
stopwords_english = stopwords.words('english')
        
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
        
from nltk.tokenize import TweetTokenizer
        
# Happy Emoticons
emoticons_happy = set([
    ':-)', ':)', ';)', ':o)', ':]', ':3', ':c)', ':>', '=]', '8)', '=)', ':}',
    ':^)', ':-D', ':D', '8-D', '8D', 'x-D', 'xD', 'X-D', 'XD', '=-D', '=D',
    '=-3', '=3', ':-))', ":'-)", ":')", ':*', ':^*', '>:P', ':-P', ':P', 'X-P',
    'x-p', 'xp', 'XP', ':-p', ':p', '=p', ':-b', ':b', '>:)', '>;)', '>:-)',
    '<3'
    ])
        
# Sad Emoticons
emoticons_sad = set([
    ':L', ':-/', '>:/', ':S', '>:[', ':@', ':-(', ':[', ':-||', '=L', ':<',
    ':-[', ':-<', '=\\', '=/', '>:(', ':(', '>.<', ":'-(", ":'(", ':\\', ':-c',
    ':c', ':{', '>:\\', ';('
    ])
        
# all emoticons (happy + sad)
emoticons = emoticons_happy.union(emoticons_sad)
        
def clean_tweets(tweet):
    # remove stock market tickers like $GE
    tweet = re.sub(r'\$\w*', '', tweet)
        
     # remove old style retweet text "RT"
     tweet = re.sub(r'^RT[\s]+', '', tweet)
        
    # remove hyperlinks
    tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet)
            
    # remove hashtags
    # only removing the hash # sign from the word
    tweet = re.sub(r'#', '', tweet)
        
    # tokenize tweets
    tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True)
    tweet_tokens = tokenizer.tokenize(tweet)
        
    tweets_clean = []   
    for word in tweet_tokens:
        if (word not in stopwords_english and # remove stopwords
                word not in emoticons and # remove emoticons
                word not in string.punctuation): # remove punctuation
            #tweets_clean.append(word)
            stem_word = stemmer.stem(word) # stemming word
            tweets_clean.append(stem_word)
        
    return tweets_clean
        
custom_tweet = "RT @Twitter @mavenbird Hello There! Have a great day. :) #good #morning http://mavenbird.com.np"
        
# print cleaned tweet
print (clean_tweets(custom_tweet))
'''
Output:
        
['hello', 'great', 'day', 'good', 'morning']
'''
        
print (pos_tweets[5])
'''
Output:
        
@User1 @User2 This one is irresistible :)
#FlipkartFashionFriday http://t.co/EbZ0L2VENM
'''
        
print (clean_tweets(pos_tweets[5]))
'''
Output:
        
['one', 'irresistible', 'flipkartfashionfriday']
'''

Feature Extraction

We define a basic bag_of_words function to extract unigram features from the tweets.

        # feature extractor function
def bag_of_words(tweet):
    words = clean_tweets(tweet)
    words_dictionary = dict([word, True] for word in words) 
    return words_dictionary

custom_tweet = "RT @Twitter @mavenbird Hello There! Have a great day. :) #good #morning https://www.mavenbird.com/"
print (bag_of_words(custom_tweet))
'''
Output:

{'great': True, 'good': True, 'morning': True, 'hello': True, 'day': True}
'''

# positive tweets feature set
pos_tweets_set = []
for tweet in pos_tweets:
    pos_tweets_set.append((bag_of_words(tweet), 'pos')) 

# negative tweets feature set
neg_tweets_set = []
for tweet in neg_tweets:
    neg_tweets_set.append((bag_of_words(tweet), 'neg'))

print (len(pos_tweets_set), len(neg_tweets_set)) # Output: (5000, 5000)

Create Train and Test Set

We have 5,000 positive tweets and 5,000 negative tweets. We will use 20% from each set—1,000 positive tweets and 1,000 negative tweets—as the test set. The remaining tweets from both the positive and negative sets will be utilized as the training set.

# radomize pos_reviews_set and neg_reviews_set
# doing so will output different accuracy result everytime we run the program
from random import shuffle 
shuffle(pos_tweets_set)
shuffle(neg_tweets_set)
                
test_set = pos_tweets_set[:1000] + neg_tweets_set[:1000]
train_set = pos_tweets_set[1000:] + neg_tweets_set[1000:]
                
print(len(test_set),  len(train_set)) # Output: (2000, 8000)

Training Classifier and Calculating Accuracy

We train the Naive Bayes Classifier with the training set and then evaluate its classification accuracy using the test set.

from nltk import classify
from nltk import NaiveBayesClassifier
        
classifier = NaiveBayesClassifier.train(train_set)
        
accuracy = classify.accuracy(classifier, test_set)
print(accuracy) # Output: 0.765
        
print (classifier.show_most_informative_features(10))   
'''
Output:
        
Most Informative Features
                     via = True              pos : neg    =     37.0 : 1.0
                    glad = True              pos : neg    =     25.0 : 1.0
                     sad = True              neg : pos    =     22.6 : 1.0
                      aw = True              neg : pos    =     21.7 : 1.0
                     bam = True              pos : neg    =     21.0 : 1.0
                     x15 = True              neg : pos    =     19.7 : 1.0
                 appreci = True              pos : neg    =     17.7 : 1.0
                   arriv = True              pos : neg    =     15.0 : 1.0
                     ugh = True              neg : pos    =     14.3 : 1.0
                  justin = True              neg : pos    =     13.0 : 1.0
'''

Testing Classifier with Custom Tweet

We input a custom tweet and observe the classification results from the trained classifier. The classifier accurately identifies both negative and positive tweets as expected.

custom_tweet = "I hated the film. It was a disaster. Poor direction, bad acting."
custom_tweet_set = bag_of_words(custom_tweet)
print (classifier.classify(custom_tweet_set)) # Output: neg
# Negative tweet correctly classified as negative
        
# probability result
prob_result = classifier.prob_classify(custom_tweet_set)
print (prob_result) # Output: 
print (prob_result.max()) # Output: neg
print (prob_result.prob("neg")) # Output: 0.941844352481
print (prob_result.prob("pos")) # Output: 0.0581556475194 
        
custom_tweet = "It was a wonderful and amazing movie.Best direction, good acting."
custom_tweet_set = bag_of_words(custom_tweet)
        
print (classifier.classify(custom_tweet_set)) # Output: pos
# Positive tweet correctly classified as positive
        
# probability result
prob_result = classifier.prob_classify(custom_tweet_set)
print (prob_result) # Output: 
print (prob_result.max()) # Output: pos
print (prob_result.prob("neg")) # Output: 0.00131055449755
print (prob_result.prob("pos")) # Output: 0.998689445502

Precision, Recall & F1-Score

Accuracy is calculated as the ratio of correctly predicted observations to the total number of observations.

Precision measures how accurate the predictions are:

It indicates how many of the predicted positive results were actually correct.
For instance, if you answered only 1 question correctly out of 100 questions, your precision would be 100%.
Precision assesses how often the classifier's predictions are correct.

Recall, in contrast to precision, focuses on the classifier's ability to identify all relevant instances:

It measures how well the classifier detects all positive cases.
It evaluates how often the classifier correctly predicts "yes" when the actual result is "yes."

The F1 Score, or F-measure, is the harmonic mean of precision and recall, providing a single metric that balances both aspects.

True Positive (TP): This refers to the number of patients who actually have cancer and were correctly diagnosed as having cancer.

True Negative (TN): This represents the number of patients who do not have cancer and were accurately identified as not having cancer.

False Positive (FP): This is the count of patients who do not have cancer but were mistakenly diagnosed as having cancer (also known as Type I error).

False Negative (FN): This indicates the number of patients who have cancer but were incorrectly diagnosed as not having cancer (also known as Type II error).

Accuracy: (TP + TN) / (TP + TN + FP + FN)
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1 Score: 2 * (precision * recall) / (precision + recall)

actual_set = defaultdict(set)
predicted_set = defaultdict(set)

actual_set_cm = []
predicted_set_cm = []

for index, (feature, actual_label) in enumerate(test_set):
    actual_set[actual_label].add(index)
    actual_set_cm.append(actual_label)

    predicted_label = classifier.classify(feature)

    predicted_set[predicted_label].add(index)
    predicted_set_cm.append(predicted_label)
    
from nltk.metrics import precision, recall, f_measure, ConfusionMatrix

print 'pos precision:', precision(actual_set['pos'], predicted_set['pos']) # Output: pos precision: 0.762896825397
print 'pos recall:', recall(actual_set['pos'], predicted_set['pos']) # Output: pos recall: 0.769
print 'pos F-measure:', f_measure(actual_set['pos'], predicted_set['pos']) # Output: pos F-measure: 0.76593625498

print 'neg precision:', precision(actual_set['neg'], predicted_set['neg']) # Output: neg precision: 0.767137096774
print 'neg recall:', recall(actual_set['neg'], predicted_set['neg']) # Output: neg recall: 0.761
print 'neg F-measure:', f_measure(actual_set['neg'], predicted_set['neg']) # Output: neg F-measure: 0.7640562249

Confusion Matrix

The Confusion Matrix is a table used to describe the performance of a classifier.

The Confusion Matrix is represented in the following format:

'''
           |   Predicted NO      |   Predicted YES     |
-----------+---------------------+---------------------+
Actual NO  | True Negative (TN)  | False Positive (FP) |
Actual YES | False Negative (FN) | True Positive (TP)  |
-----------+---------------------+---------------------+
'''

The output of the confusion matrix below illustrates the performance of our trained classifier.

761 negative tweets were correctly classified as negative (TN).
239 negative tweets were incorrectly classified as positive (FP).
231 positive tweets were incorrectly classified as negative (FN).
769 positive tweets were correctly classified as positive (TP).

# Confusion Matrix for the test set
# 
# Output: 
# row = actual_set_cm 
# column = predicted_set_cm
cm = ConfusionMatrix(actual_set_cm, predicted_set_cm)
print (cm)
'''
Output:

    |   n   p |
    |   e   o |
    |   g   s |
----+---------+
neg |<761>239 |
pos | 231<769>|
----+---------+
(row = reference; col = test)
'''

print (cm.pretty_format(sort_by_count=True, show_percents=True, truncate=9))
'''
Output:

    |      n      p |
    |      e      o |
    |      g      s |
----+---------------+
neg | <38.0%> 11.9% |
pos |  11.6% <38.5%>|
----+---------------+
(row = reference; col = test)
'''

August 7, 2024 | View: 837 | Categories: Python | By: Harshil Patwa | 16 min read

E-Commerce Stack

Tech Stack

Mobile Tech Stack

Tools

Magento Expertise

Shopify Solutions

E-Commerce Stack

Tech Stack

Mobile Tech Stack

Tools

Magento Expertise

Shopify Solutions

How to Perform Twitter Sentiment Analysis with Python NLTK: A Guide to Natural Language Processing

Supervised Classification

Tokenize Tweets

Cleaning Tweet

Feature Extraction

Create Train and Test Set

Training Classifier and Calculating Accuracy

Testing Classifier with Custom Tweet

Precision, Recall & F1-Score

Confusion Matrix

Company

Key Services

Support

E-Commerce Stack

Tech Stack

Mobile Tech Stack

Tools

Magento Expertise

Shopify Solutions

How to Perform Twitter Sentiment Analysis with Python NLTK: A Guide to Natural Language Processing

Supervised Classification

Tokenize Tweets

Cleaning Tweet

Feature Extraction

Create Train and Test Set

Training Classifier and Calculating Accuracy

Testing Classifier with Custom Tweet

Precision, Recall & F1-Score

Confusion Matrix

Share this post

About the Author

Company

Key Services

Support

AI-Driven Solutions

Let’s Grow Your Revenue

Faster Time to Market

Proven Execution

Performance First

Reliable Partnership

Request a Free Quote and expert consultation.