This article demonstrates how to conduct Sentiment Analysis on Twitter data using Python and TextBlob.

TextBlob offers an API for various Natural Language Processing (NLP) tasks, including Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (such as Naive Bayes and Decision Tree), Language Translation and Detection, Spelling Correction, and more.

TextBlob is built on top of the Natural Language Toolkit (NLTK).

Sentiment Analysis involves evaluating the sentiment conveyed in a given text or document and categorizing it into specific classes, such as positive or negative. Essentially, it classifies text into two main categories: positive and negative. Additionally, other categories such as neutral, highly positive, or highly negative can also be included.

Installing TextBlob

You need to execute the following command to install TextBlob:

pip install -U textblob
python -m textblob.download_corpora
    

Simple TextBlob Sentiment Analysis Example

We will explore a basic example using TextBlob to perform Sentiment Analysis on a provided text. The sentiment property returns sentiment scores for the given text, which include two metrics: Polarity and Subjectivity.

The polarity score is a float value between -1.0 and 1.0. A negative value indicates a negative sentiment, while a positive value suggests that the text conveys a positive sentiment.

The subjectivity score is a float ranging from 0.0 to 1.0. A score of 0.0 means the text is very objective, whereas a score of 1.0 indicates that the text is highly subjective.

from textblob import TextBlob

text = TextBlob("It was a wonderful movie. I liked it very much.")
        
print (text.sentiment)
print ('polarity: {}'.format(text.sentiment.polarity))
print ('subjectivity: {}'.format(text.sentiment.subjectivity))
'''
Output:
        
Sentiment(polarity=0.62, subjectivity=0.6866666666666666)
polarity: 0.62
subjectivity: 0.686666666667
'''
        
text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.")
print (text.sentiment)
'''
Output:
        
Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666)
'''
        
text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.")
print (text.sentiment)
'''
Output:
        
Sentiment(polarity=0.3, subjectivity=0.4)
'''
    

Using NLTK’s Twitter Corpus

We use the twitter_samples corpus to train TextBlob's NaiveBayesClassifier. From this corpus, we create training and testing datasets that consist of a selection of positive and negative tweets.

After training the classifier, we assess its accuracy using the test dataset to evaluate its performance.

from nltk.corpus import twitter_samples
print (twitter_samples.fileids())
'''
Output:

['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json']
'''

pos_tweets = twitter_samples.strings('positive_tweets.json')
print (len(pos_tweets)) # Output: 5000

neg_tweets = twitter_samples.strings('negative_tweets.json')
print (len(neg_tweets)) # Output: 5000

#all_tweets = twitter_samples.strings('tweets.20150430-223406.json')
#print (len(all_tweets)) # Output: 20000

# positive tweets words list
pos_tweets_set = []
for tweet in pos_tweets:
    pos_tweets_set.append((tweet, 'pos'))

# negative tweets words list
neg_tweets_set = []
for tweet in neg_tweets:
    neg_tweets_set.append((tweet, 'neg'))

print (len(pos_tweets_set), len(neg_tweets_set)) # Output: (5000, 5000)

# radomize pos_reviews_set and neg_reviews_set
# doing so will output different accuracy result everytime we run the program
from random import shuffle 
shuffle(pos_tweets_set)
shuffle(neg_tweets_set)
    

Create Train and Test Set

For this example, we create a small training and testing dataset:

  • Test set: 200 tweets (100 positive and 100 negative)
  • Training set: 400 tweets (200 positive and 200 negative)

Keep in mind that a larger training dataset generally leads to better classification accuracy. Therefore, a more extensive training set is usually more effective.

# test set = 200 tweets (100 positive + 100 negative)
# train set = 400 tweets (200 positive + 200 negative)
test_set = pos_tweets_set[:100] + neg_tweets_set[:100]
train_set = pos_tweets_set[100:300] + neg_tweets_set[100:300]
        
print(len(test_set),  len(train_set)) # Output: (200, 400)
        
# train classifier
from textblob.classifiers import NaiveBayesClassifier
classifier = NaiveBayesClassifier(train_set)
    

Training the Classifier & Calculating Accuracy

# calculate accuracy
accuracy = classifier.accuracy(test_set)
print (accuracy) # Output: 0.715
        
# show most frequently occurring words
print (classifier.show_informative_features(10))
'''
Output:
        
Most Informative Features
           contains(not) = True              neg : pos    =      6.6 : 1.0
          contains(love) = True              pos : neg    =      6.3 : 1.0
           contains(day) = True              pos : neg    =      5.7 : 1.0
            contains(no) = True              neg : pos    =      5.4 : 1.0
            contains(na) = True              neg : pos    =      5.0 : 1.0
        contains(Thanks) = True              pos : neg    =      3.7 : 1.0
           contains(why) = True              neg : pos    =      3.7 : 1.0
         contains(happy) = True              pos : neg    =      3.7 : 1.0
         contains(never) = True              neg : pos    =      3.7 : 1.0
        contains(though) = True              neg : pos    =      3.7 : 1.0
'''
        
text = "It was a wonderful movie. I liked it very much."
print (classifier.classify(text)) # Output: pos
        
text = "I don't like movies having happy ending."
print (classifier.classify(text)) # Output: neg
        
text = "The script was predictable. However, it was a wonderful movie. I liked it very much."
blob = TextBlob(text, classifier=classifier)
        
print (blob) # Output: The script was predictable. However, it was a wonderful movie. I liked it very much.
print (blob.classify()) # Output: pos
        
for sentence in blob.sentences:
    print ("{} ({})".format(sentence, sentence.classify()))
'''
Output:
        
The script was predictable. (neg)
However, it was a wonderful movie. (pos)
I liked it very much. (pos)
'''