An Introduction to Social Sentiment and Analyzing Tweets
Sentiment Analysis, also known as opinion mining, is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Sentiment analysis is often used to understand the opinion or attitude in tweets, status updates, movie/music/television reviews, chats, emails, comments, and more. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products.
For example, sentiment analysis can be used to better understand how your customers on twitter perceive your brand online in real-time. Instead of going through hundreds, or maybe thousands, of tweets by hand, you can easily group positive, neutral, and negative tweets together, and sort them based on their confidence values. This can save you countless hours and leads to actionable insights.
Additionally, you can use another natural language processing algorithm called Latent Dirichlet allocation. This powerful algorithm can extract a mixture of topics for a given set of documents. LDA doesn’t give you the topic name, but gives you a very good idea about what the topic is. Each topic is represented by a bag of words, which tells you what the topic is about. As an example, if the bag of words are: “sun, nasa, earth, moon”, you can understand the topic is probably related to space.
You can use both of these powerful algorithms to build a tool to analyze tweets in real time. This tool will just give you just a glimpse of what can be built using just two machine learning algorithms and a scraper. Learn more about Sentiment Analysis with this handy Algorithmia Guide..
Using NLP from Algorithmia to Build an App For Analyzing Tweets on Demand
Analyze Tweets is a minimal demo from Algorithmia that searches Twitter by keyword and analyzes tweets for sentiment and LDA topics. Currently it searches up to 500 tweets that aren’t older than 7 days. It also caches each query up to 24 hours.
Currently the demo starts initializing with a random keyword immediately after this web page loads. The randomly selected keywords are: algorithmia, microsoft, amazon, google, facebook, and reddit.
You can check out the demo below:
How we built it
Before we can start, you need to have an Algorithmia account. You can create one on our website.
The first thing we did was to click the Add Algorithm button located under the user tab, which is located at the top right side of the webpage.
We enabled the Special Permissions, so that our algorithm can call other algorithms and have access to the internet. We also selected Python as our preferred language.
The first thing we decided was to pick which algorithms we were going to use. We came up with a recipe that uses 3 different algorithms in the following steps:
- It first checks if the query has been cached before and the cache isn’t older than 24 hours.
- If it has the appropriate cache, it returns the cached query.
- If it can’t find an appropriate cached copy, it continues to run the analysis.
- It retrieves up to 500 tweets by using the user provided keyword and twitter/RetrieveTweetsWithKeyword algorithm.
- It runs the nlp/SocialSentimentAnalysis algorithm to get sentiment scores for all tweets.
- It creates two new groups of tweets that are respectively the top 20% most positive and top 20% most negative tweets.
- We do this by sorting the list of tweets based on their overall sentiment provided by the previously mentioned algorithm.
- We then take the top 20% and bottom 20% to get the most positive and negative tweets.
- We also remove any tweets in both groups that have a sentiment of 0 (neutral). This is because the twitter retriever does don’t guarantee to return 500 tweets each time. For more information, click here.
- These two new groups (positive and negative) of tweets are fed into the nlp/LDA algorithm to extract positive and negative topics.
- All relevant information is first cached, then is returned.
Based on our recipe above, we decided to wrap this app into a micro-service and call it nlp/AnalyzeTweets. You can check out the source code here. You can copy and paste the code to your own algorithm if you’re feeling lazy. The only part you would need to change would be line 28 from:
cache_uri = "data://demo/AnalyzeTweets/" + str(query_hash) + ".json"
cache_uri = "data://yourUserName/AnalyzeTweets/" + str(query_hash) + ".json"
And the last step, is to compile and publish the new algorithm.
And… Congrats! By using different algorithms as building blocks, you now have a working micro-service on the Algorithmia platform!
After writing the algorithm, we built a client-side (JS + CSS + HTML) demo that calls our new algorithm with input from the user. In our client-side app, we used d3.js for rendering the pie chart and histogram. We also used DataTables for rendering interactive tabular data. SweetAlert was used for the stylish popup messages. You can find a standalone version of our Analyze Tweets app here.
A real world case study
While we were building the demo app and writing this blog post (on Jan 28th), Github went down and was completely inaccessible. Being the dataphiles we are, we immediately ran an analysis on the keyword “github” using our new app.
As you can expect, developers on Twitter were not positive about GitHub being down. Nearly half of all tweets relating to GitHub at this time were negative, but to get a better idea of the types of reactions we’re seeing, we can look to the topics generated with LDA. We can see that Topic 4 probably represents a group of people who are more calm about the situation. The famous raging unicorn makes an appearance in multiple topics and we can also see some anger in the other topics. And of course, what analysis of developer tweets would be complete without people complaining about other people complaining? We can see this in Topic 2 with words like “whine”. See the full analysis during the downtime below: