How to Analyze Twitter Sentiments with TextBlob in Python
This article demonstrates how to conduct Sentiment Analysis on Twitter data using Python and TextBlob.
TextBlob offers an API for various Natural Language Processing (NLP) tasks, including Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (such as Naive Bayes and Decision Tree), Language Translation and Detection, Spelling Correction, and more.
TextBlob is built on top of the Natural Language Toolkit (NLTK).
Sentiment Analysis involves evaluating the sentiment conveyed in a given text or document and categorizing it into specific classes, such as positive or negative. Essentially, it classifies text into two main categories: positive and negative. Additionally, other categories such as neutral, highly positive, or highly negative can also be included.
Installing TextBlob
You need to execute the following command to install TextBlob:
pip install -U textblob python -m textblob.download_corpora
Simple TextBlob Sentiment Analysis Example
We will explore a basic example using TextBlob to perform Sentiment Analysis on a provided text. The
sentiment
property returns sentiment scores for the given text, which include two metrics: Polarity
and Subjectivity.
The polarity score is a float value between -1.0 and 1.0. A negative value indicates a negative sentiment, while a positive value suggests that the text conveys a positive sentiment.
The subjectivity score is a float ranging from 0.0 to 1.0. A score of 0.0 means the text is very objective, whereas a score of 1.0 indicates that the text is highly subjective.
from textblob import TextBlob text = TextBlob("It was a wonderful movie. I liked it very much.") print (text.sentiment) print ('polarity: {}'.format(text.sentiment.polarity)) print ('subjectivity: {}'.format(text.sentiment.subjectivity)) ''' Output: Sentiment(polarity=0.62, subjectivity=0.6866666666666666) polarity: 0.62 subjectivity: 0.686666666667 ''' text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.") print (text.sentiment) ''' Output: Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666) ''' text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.") print (text.sentiment) ''' Output: Sentiment(polarity=0.3, subjectivity=0.4) '''
Using NLTK’s Twitter Corpus
We use the twitter_samples
corpus to train TextBlob's NaiveBayesClassifier. From this corpus, we
create training and testing datasets that consist of a selection of positive and negative tweets.
After training the classifier, we assess its accuracy using the test dataset to evaluate its performance.
from nltk.corpus import twitter_samples print (twitter_samples.fileids()) ''' Output: ['negative_tweets.json', 'positive_tweets.json', 'tweets.20150430-223406.json'] ''' pos_tweets = twitter_samples.strings('positive_tweets.json') print (len(pos_tweets)) # Output: 5000 neg_tweets = twitter_samples.strings('negative_tweets.json') print (len(neg_tweets)) # Output: 5000 #all_tweets = twitter_samples.strings('tweets.20150430-223406.json') #print (len(all_tweets)) # Output: 20000 # positive tweets words list pos_tweets_set = [] for tweet in pos_tweets: pos_tweets_set.append((tweet, 'pos')) # negative tweets words list neg_tweets_set = [] for tweet in neg_tweets: neg_tweets_set.append((tweet, 'neg')) print (len(pos_tweets_set), len(neg_tweets_set)) # Output: (5000, 5000) # radomize pos_reviews_set and neg_reviews_set # doing so will output different accuracy result everytime we run the program from random import shuffle shuffle(pos_tweets_set) shuffle(neg_tweets_set)
Create Train and Test Set
For this example, we create a small training and testing dataset:
- Test set: 200 tweets (100 positive and 100 negative)
- Training set: 400 tweets (200 positive and 200 negative)
Keep in mind that a larger training dataset generally leads to better classification accuracy. Therefore, a more extensive training set is usually more effective.
# test set = 200 tweets (100 positive + 100 negative) # train set = 400 tweets (200 positive + 200 negative) test_set = pos_tweets_set[:100] + neg_tweets_set[:100] train_set = pos_tweets_set[100:300] + neg_tweets_set[100:300] print(len(test_set), len(train_set)) # Output: (200, 400) # train classifier from textblob.classifiers import NaiveBayesClassifier classifier = NaiveBayesClassifier(train_set)
Training the Classifier & Calculating Accuracy
# calculate accuracy accuracy = classifier.accuracy(test_set) print (accuracy) # Output: 0.715 # show most frequently occurring words print (classifier.show_informative_features(10)) ''' Output: Most Informative Features contains(not) = True neg : pos = 6.6 : 1.0 contains(love) = True pos : neg = 6.3 : 1.0 contains(day) = True pos : neg = 5.7 : 1.0 contains(no) = True neg : pos = 5.4 : 1.0 contains(na) = True neg : pos = 5.0 : 1.0 contains(Thanks) = True pos : neg = 3.7 : 1.0 contains(why) = True neg : pos = 3.7 : 1.0 contains(happy) = True pos : neg = 3.7 : 1.0 contains(never) = True neg : pos = 3.7 : 1.0 contains(though) = True neg : pos = 3.7 : 1.0 ''' text = "It was a wonderful movie. I liked it very much." print (classifier.classify(text)) # Output: pos text = "I don't like movies having happy ending." print (classifier.classify(text)) # Output: neg text = "The script was predictable. However, it was a wonderful movie. I liked it very much." blob = TextBlob(text, classifier=classifier) print (blob) # Output: The script was predictable. However, it was a wonderful movie. I liked it very much. print (blob.classify()) # Output: pos for sentence in blob.sentences: print ("{} ({})".format(sentence, sentence.classify())) ''' Output: The script was predictable. (neg) However, it was a wonderful movie. (pos) I liked it very much. (pos) '''