Twitter isn’t just my favorite way to waste time when my boss isn’t looking––it’s also a powerful data source for understanding what your customers are saying about you. Getting access to Twitter data for analysis is easy with the Algorithmia platform, and this post will walk you through how to do just that.
Twitter’s API and Authentication Process
In order to get up and running with programmatically accessing Twitter data, you’ll need to create what Twitter calls an “application” under your account. To start, head to https://apps.twitter.com/ and click the Create New App button.
Note: we recommend creating a new, separate Twitter account to utilize for this project just in case your API keys get compromised.
The setup process asks you for some basic information about your application. For Website, you can put https://google.com or any other valid URL.
Once you’ve successfully created your Twitter app, head over to the Keys and Access Tokens tab. You’ll want to grab 4 values:
- Consumer Key (API Key)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
You may need to generate your Access Token and Access Token Secret from scratch using the button on the lower half of the screen. Keep these credentials safe! You won’t be able to use the Twitter API without them.
Working With the Algorithmia Client
Once you have your Twitter API credentials locked down, it’s easy to get up and running with Algorithmia in your language of choice. We support all major programming languages, but Python is my choice for data focused tasks (and anything else, frankly). Installing the algorithmia client in Python is as easy as
pip install algorithmia
For information on how to get the algorithmia client set up in other languages, check out our developer center.
With the client set up, calling Algorithmia algorithms takes just a few lines of code. Our syntax will look something like this:
Twitter API Rate Limiting
The last thing to know before getting started is how Twitter handles rate limiting. All the information you’ll need is in Twitter’s official API documentation about rate limiting here, but we’ll cover some highlights here.
Each different GET request (we won’t be working with POST or PUT) has a unique rate limit (full list here). We’ll be using three different types of GET requests: tweets, followers, and friends. Here are the rate limits for each:
- Retrieve Tweets by Keyword: 450 calls per 15 mins
- Retrieve Tweets by User: 1500 calls per 15 mins
- Retrieve Twitter Followers: 15 calls per 15 mins
- Retrieve Twitter Friends: 15 calls per 15 mins
Twitter can sometimes also limit the number of results that a GET request will return, so we’ve built in a few extra parameters that will help you avoid that. Let’s get started!
Getting Tweets by Keyword
Getting Tweets by searching with a keyword is a popular way to judge sentiment around a person, idea, or business. Our Retrieve Tweets With Keyword implementation at Algorithmia lets you search for a keyword and set the number of tweets you want to get back (rate limiting aside).
The Retrieve Tweets With Keyword algorithm returns complete Tweet objects with additional parameters like number of retweets and favorites. You might want to package the returned JSON into a Pandas dataframe or some other sort of more analyzable format.
Getting Tweets by Username
The Retrieve Tweets With User algorithm lets you grab the text of tweets created by an individual user. The parameters are identical to Retrieve Tweets With Keyword, but instead of a keyword query you pass a username.
This implementation returns a list of tweets (just text) to make analysis and manipulation simpler.
Retrieve Twitter Followers lets you get the followers of any public Twitter user. But it’s a bit more complex: since some of the users that you might be interested in analyzing will have tons of followers (some celebrities have more than 100 Million!), the Twitter API rate limiting won’t let you access the full list at once.
To avoid that issue, Retrieve Twitter Followers takes two extra parameters: page and count. The page parameter specifies which page of followers you’d like to access, and count dictates a hard cap on the maximum number of followers to return. To work with rate limiting constraints, consider grabbing a few pages at a time and capping the maximum count at a few hundred.
You’ll get back a list of follower usernames that can easily be passed into other algorithms for further analysis.
As many, many people in this Quora question mistakenly ignored, Twitter does have a concept of friends (hint: don’t answer Quora questions that you don’t know the answer to). Friends are people that you follow, while followers are people that you follow. You can get a user’s friends with Retrieve Twitter Friends. Since it’s rare for users to follow that many people, there is no count or page parameter.
The algorithm returns a list of friend usernames.
Next Steps and Analysis
Getting any type of Twitter data––whether it’s the tweets themselves, followers, or friends––is almost always just the beginning. Twitter data is a wonderful medium for different types of analysis, and a great example is the sentiment analysis demo we created here. In addition to sentiment analysis, there are a number of other interesting types of algorithms that you can apply to your Twitter data:
- Use LDA (generative topic model extractor) to split your tweets into different topics
- Automatically extract tags from your tweets with AutoTag
- Detect and remove inappropriate words with Profanity Detection
- Correct spelling issues with Spelling Correction