Analyzing Twitter Sentiment in Real Time with TextBlob
This article demonstrates how to conduct Sentiment Analysis on live Twitter data using Python and TextBlob.
Previously, I authored an article on a similar subject, focusing on Sentiment Analysis on Tweets using TextBlob and leveraging the NLTK’s Twitter Corpus.
GetOldTweets-python allows you to:
- Retrieve tweets from any user
- Search for tweets containing specific text
- Find tweets from specific date ranges
- Locate tweets based on geographic location
- Filter tweets by language
- Search tweets by hashtags
- Filter tweets by the number of retweets
- And much more...
Additionally, GetOldTweets-python offers the capability to export tweets to a CSV file, enabling you to save tweets first and then process them later.
TextBloboffers an API capable of executing various Natural Language Processing (NLP) tasks such as Part-of-Speech Tagging, Noun Phrase Extraction, Sentiment Analysis, Classification (using Naive Bayes and Decision Trees), Language Translation and Detection, Spelling Correction, and more.
TextBlob is built upon Natural Language Toolkit (NLTK).
Sentiment Analysis involves examining the sentiment of a text or document and categorizing it into classes such as positive or negative. Essentially, it classifies text as either positive or negative, but additional categories like neutral, highly positive, and highly negative can also be included.
Installing TextBlob
You have to run the following command to install TextBlob:
pip install -U textblob python -m textblob.download_corpora
Simple TextBlob Sentiment Analysis Example
Let's look at a basic TextBlob example that performs Sentiment Analysis on a given text. The sentiment property provides two scores for the text: Polarity and Subjectivity.
The polarity score is a float within the range [-1.0, 1.0] where negative value indicates negative text and positive value indicates that the given text is positive.
The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
from textblob import TextBlob text = TextBlob("It was a wonderful movie. I liked it very much.") print (text.sentiment) print ('polarity: {}'.format(text.sentiment.polarity)) print ('subjectivity: {}'.format(text.sentiment.subjectivity)) ''' Output: Sentiment(polarity=0.62, subjectivity=0.6866666666666666) polarity: 0.62 subjectivity: 0.686666666667 ''' text = TextBlob("I liked the acting of the lead actor but I didn't like the movie overall.") print (text.sentiment) ''' Output: Sentiment(polarity=0.19999999999999998, subjectivity=0.26666666666666666) ''' text = TextBlob("I liked the acting of the lead actor and I liked the movie overall.") print (text.sentiment) ''' Output: Sentiment(polarity=0.3, subjectivity=0.4) '''
Using GetOldTweets-python to fetch Tweets
- Clone the GetOldTweets-python repository
- Navigate to the cloned repository's directory
- Run the Main.py file, which includes the example code
python Main.py
Note:
At the time of writing this article, the GetOldTweets-python repository does not support adding a language filter to search queries. However, there is a pull request that adds this functionality, though it has not yet been merged into the main branch. Hopefully, it will be merged soon.
You can refer to this fork of GetOldTweets-python for language search support.
Searching Tweets for our own Search Term
Within the cloned GetOldTweets-python repository folder, import the "got" package, noting that there are separate packages for Python 2 and Python 3.
import sys if sys.version_info[0] < 3: import got else: import got3 as got
Let’s try to search for 15 tweets containing the term "PythonProgramming" between January 1, 2023, and January 2, 2023.
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('PythonProgramming').setSince("2023-01-01").setUntil("2023-01-02").setMaxTweets(15) # You can use "setLang" only if the package supports language-based search queries # tweetCriteria = got.manager.TweetCriteria().setQuerySearch('PythonProgramming').setSince("2023-01-01").setUntil("2023-01-02").setMaxTweets(15).setLang('en') # Get the first fetched tweet tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0] # Print result print(tweet.username) # Output: CodingMaster print(tweet.text) # Output: Python is a versatile language used for web development, data science, and more! #PythonProgramming print(tweet.retweets) # Output: 5 print(tweet.mentions) # Output: print(tweet.hashtags) # Output: #PythonProgramming # Print all tweets tweets = got.manager.TweetManager.getTweets(tweetCriteria) for tweet in tweets: print(tweet.text + '\n') ''' Output: Python is a versatile language used for web development, data science, and more! #PythonProgramming JavaScript is great for front-end development. #PythonProgramming #WebDev Loving the new features in Python 3.9! #PythonProgramming I prefer Python over Java for data science. #PythonProgramming #DataScience Learning Python is a must for aspiring data scientists. #PythonProgramming #MachineLearning Excited about the upcoming Python conference! #PythonProgramming #TechEvents Why Python is the best language for beginners? #PythonProgramming Top Python libraries for data analysis. #PythonProgramming #DataScience Can't wait to try out the new Python framework. #PythonProgramming #Programming Which is your favorite Python IDE? #PythonProgramming '''
Clean Tweets
Let’s write a function to clean tweets. We remove mentions, hashtags, URL links, and punctuations from the tweets using regular-expression.
import re # importing regex import string def clean_tweet(tweet): ''' Remove unnecessary elements from the tweet like mentions, hashtags, URL links, punctuations ''' # Remove old style retweet text "RT" tweet = re.sub(r'^RT[\s]+', '', tweet) # Remove hyperlinks tweet = re.sub(r'https?:\/\/.*[\r\n]*', '', tweet) # Remove hashtags tweet = re.sub(r'#', '', tweet) # Remove mentions tweet = re.sub(r'@[A-Za-z0-9]+', '', tweet) # Remove punctuations tweet = re.sub(r'['+string.punctuation+']+', ' ', tweet) return tweet # Testing clean_tweet function sample_tweet = "Python is a versatile language used for web development, data science, and more! #PythonProgramming" print(clean_tweet(sample_tweet)) ''' Output: Python is a versatile language used for web development data science and more PythonProgramming '''
Get Sentiment of the Tweet
We pass the cleaned tweet text to the TextBlob class which creates a TextBlob object. It contains sentiment polarity and subjectivity of the text. Polarity greater than zero is positive, lesser than zero is negative and equal to zero can be considered as neutral.
from textblob import TextBlob def get_tweet_sentiment(tweet): ''' Get sentiment value of the tweet text It can be either positive, negative, or neutral ''' # Create TextBlob object of the passed tweet text blob = TextBlob(clean_tweet(tweet)) # Get sentiment if blob.sentiment.polarity > 0: sentiment = 'positive' elif blob.sentiment.polarity < 0: sentiment = 'negative' else: sentiment = 'neutral' return sentiment # Testing tweet sentiment sample_tweet = "Python is a versatile language used for web development, data science, and more! #PythonProgramming" print(get_tweet_sentiment(sample_tweet)) # Output: positive
Process Tweets
We create a new function that processes tweets to return an array of tweets and their respective sentiment values.
def get_processed_tweets(tweets): ''' Get array of processed tweets containing the tweet text and its sentiment value ''' processed_tweets = [] for tweet in tweets: tweet_dict = {} tweet_dict['text'] = tweet.text tweet_dict['sentiment'] = get_tweet_sentiment(tweet.text) # If the tweet contains retweet # then only append the single tweet # and don't append the retweets of the same tweet if tweet.retweets > 0: if tweet_dict not in processed_tweets: processed_tweets.append(tweet_dict) else: processed_tweets.append(tweet_dict) return processed_tweets # Getting tweets with sentiment value tweetCriteria = got.manager.TweetCriteria().setQuerySearch('PythonProgramming').setSince("2023-01-01").setUntil("2023-01-02").setMaxTweets(10).setLang('en') tweets = got.manager.TweetManager.getTweets(tweetCriteria) tweets_with_sentiment = get_processed_tweets(tweets) for item in tweets_with_sentiment: print(item) print('') ''' Output: {'text': 'Python is a versatile language used for web development, data science, and more! #PythonProgramming', 'sentiment': 'positive'} {'text': 'Loving the new features in Python 3.9! #PythonProgramming', 'sentiment': 'positive'} {'text': 'I prefer Python over Java for data science. #PythonProgramming #DataScience', 'sentiment': 'positive'} {'text': 'Learning Python is a must for aspiring data scientists. #PythonProgramming #MachineLearning', 'sentiment': 'positive'} {'text': 'Excited about the upcoming Python conference! #PythonProgramming #TechEvents', 'sentiment': 'positive'} {'text': 'Why Python is the best language for beginners? #PythonProgramming', 'sentiment': 'positive'} {'text': 'Top Python libraries for data analysis. #PythonProgramming #DataScience', 'sentiment': 'positive'} {'text': 'Can't wait to try out the new Python framework. #PythonProgramming #Programming', 'sentiment': 'positive'} {'text': 'Which is your favorite Python IDE? #PythonProgramming', 'sentiment': 'neutral'} '''
Get Percentage of Positive, Negative, and Neutral Tweets
We previously obtained the sentiment value of each tweet. Now, let’s calculate the percentage and count of positive, negative, and neutral tweets.
Here, we fetch 1000 tweets and process them.
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('PythonProgramming').setSince("2023-01-01").setUntil("2023-01-02").setMaxTweets(1000).setLang('en') tweets = got.manager.TweetManager.getTweets(tweetCriteria) tweets_with_sentiment = get_processed_tweets(tweets) positive_tweets = [tweet for tweet in tweets_with_sentiment if tweet['sentiment'] == 'positive'] negative_tweets = [tweet for tweet in tweets_with_sentiment if tweet['sentiment'] == 'negative'] neutral_tweets = [tweet for tweet in tweets_with_sentiment if tweet['sentiment'] == 'neutral'] positive_percent = 100 * len(positive_tweets) / len(tweets_with_sentiment) negative_percent = 100 * len(negative_tweets) / len(tweets_with_sentiment) neutral_percent = 100 * len(neutral_tweets) / len(tweets_with_sentiment) print('Positive Tweets | Count: {}, Percent: {} %'.format(len(positive_tweets), positive_percent)) print('Negative Tweets | Count: {}, Percent: {} %'.format(len(negative_tweets), negative_percent)) print('Neutral Tweets | Count: {}, Percent: {} %'.format(len(neutral_tweets), neutral_percent)) ''' Output: Positive Tweets | Count: 680 , Percent: 68 % Negative Tweets | Count: 50 , Percent: 5 % Neutral Tweets | Count: 270 , Percent: 27 % '''