We’re excited to announce the winners of the first-ever Algorithmia Shorties contest. Inspired by NaNoGenMo, the Algorithmia Shorties were designed to help programmers of all skill levels learn more about natural language processing. We challenged developers to get started with NLP by creating computer-generated works of short story fiction using only algorithms. Entries were judged by their originality, readability, and creative use of the Algorithmia platform.
Grand Prize Winner: Big Dummy’s Leaves of Spam
Our winner of the Algorithmia Shorties is Big Dummy’s Leaves of Spam by Skwak. We loved this entry because it creatively mashed up the classic poem Leaves of Grass by Walt Whitman, and Big Dummy’s Guide to the Internet, a sort of layperson’s guide to the internet, with “select works” from her Gmail Spam folder. The result is a highly readable, and often hilarious poem about the philosophy of life, the Internet, humanity, and scams.
Skwak used Sentence Split, Generate Paragraph From Trigram, and Generate Trigram Frequencies from Algorithmia. In addition, she created an Express app and hosted it on Heroku. Here Github repo can be found here.
Honorable Mention: n+7
Our first honorable mention is n+7 by fabianekc. It was inspired by a class in Mathematics & Literature he took in 2001. He learned about the “n+7 method,” which is where you replace each noun in a text with the noun in a dictionary seven places after the original. For example, “a man meets a woman” is transformed to “a mandrake meets a wonder” using the Langenscheidt dictionary. The n+7 entry features the first four sections of Flatland by E. Abbott translated as a corpus. What we loved about this entry was that fabianekc created an algorithm on Algorithmia to accomplish the n+7 method. The algorithm takes either a link to a text file for processing, a pre-processed text file, or plain text, and replaces each noun with the noun in the dictionary seven places away. You can also change the dictionary, the offset (how many places after the original noun to use), and the start and end of the text to begin the replacement. Check out the Github repo here.
Honorable Mention: Arrested Development Episode
Our second honorable mention goes to a computer-generated Arrested Development episode, created by peap and jennaplusplus. The duo generated trigrams using the scripts from the first three seasons of Arrested Development for all characters that had more than 10 lines overall. This created an eye-popping 71 trigram files! To create the faux episode script, they started the episode off with the Narrator speaking, and then randomly selected which characters would speak based on the size of their trigram file, ensuring no character spoke twice in a row, which resulted in every character having 1-5 lines every time they spoke. Check out their Github repo here.
How The Algorithmia Shorties Were Generated
For algorithmically-generated short stories, users start by finding a corpus of text available in the public domain through Feedbooks or Project Gutenberg. Other interesting corpora users could use are TV scripts, software license agreements, personal journals, public speeches, or Wikipedia articles. Users then generate trigrams, which are a collection of all the three-word sequences in the corpus. With trigrams generated, the last step is to reassemble the trigrams into sentences using Random Text From Trigrams, or Generate Paragraph From Trigrams.
Want to learn more? For a detailed walkthrough, check out our trigram tutorial here.
Sentiment Analysis, also known as opinion mining, is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Sentiment analysis is often used to understand the opinion or attitude in tweets, status updates, movie/music/television reviews, chats, emails, comments, and more. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products.
Want to learn more about Sentiment Analysis? Read the Algorithmia Guide to Sentiment Analysis.
For example, sentiment analysis can be used to better understand how your customers perceive your brand or product on Twitter in real-time. Instead of going through hundreds, or maybe thousands, of tweets by hand, you can easily group positive, neutral, and negative tweets together, and sort them based on their confidence values. This will save you countless hours and leads to actionable insights.
Sentiment Analysis Algorithms On Algorithmia
Anyone who has worked with sentiment analysis can tell you it’s a very hard problem. It’s not hard to find trained models for classifying very specific domains, but there currently isn’t an all-purpose sentiment analyzer in the wild. In general, sentiment analysis becomes even harder when the volume of text increases, due to the complex relations between words and phrases.
On the Algorithmia marketplace, the most popular sentiment analysis algorithm is nlp/SentimentAnalysis. It’s an algorithm based on the popular and well known NLP library called Stanford CoreNLP, and it does a good job of analyzing large bodies of text. However, we’ve observed that the algorithm tends to return overly negative sentiment on short bodies of text, and decided that it needed some improvement.
We’ve found at that the Stanford CoreNLP library was originally intended for building NLP pipelines in a fast and simple fashion, and didn’t focus too much on individual features, and lacks documentation about how to retrain the model. Additionally, the library doesn’t provide confidence values, but rather returns an integer between 0 and 4.
A Better Alternative
We decided to test out a few alternative open-source libraries out there that would hopefully outperform the current algorithm when it came to social media sentiment analysis. We came across an interesting and quite well performing library called Vader Sentiment Analysis. The library is based on a paper published by SocialAI at Georgia Tech. This new algorithm performed exceptionally well, and has been added it to the marketplace. It’s called nlp/SocialSentimentAnalysis, and is designed to analyze social media texts.
We ran benchmarks on both nlp/SentimentAnalysis and nlp/SocialSentimentAnalysis to compare them to each other. We used an open dataset from the Crowdflowers Data for Everyone initiative. The Apple Computers Twitter sentiment dataset was selected because the tweets covered a wide array of topics. Real life data is almost never homogeneous, since it almost always has noise in it. Using this dataset helped us better understand how the algorithm performed when faced with real data.
We removed tweets that had no sentiment, and then filtered out anything that didn’t return 100% confidence so that the tweets were grouped and labeled tweets by consensus. This decreased the size of the dataset from ~4000 tweets to ~1800 tweets.
Running Time Comparison
The first comparison we made was the running time for each algorithm. The new nlp/SocialSentimentAnalysis algorithm runs up to 12 times faster than the nlp/SentimentAnalysis algorithm.
As you can see in the bar chart below, the new social sentiment analysis algorithm performs 15% better in overall accuracy.
We can also see how well it performs in accurately predicting each specific label. We used the One-Versus-All method to calculate the accuracy for every individual label. The new algorithm outperformed the old one in every given label.
When To Use Social Sentiment Analysis
The new algorithm works well with social media texts, as well as texts that inherently have a similar structure and nature (i.e. status updates, chat messages, comments, etc). The algorithm can still give you sentiments for bigger texts such as reviews, or articles, but it will probably be not as accurate as it is with social media texts.
An example application would be to monitor social media for how people are reacting to a change to your product, such as when Foursquare split their app in two: Swarm and Foursquare. Or, when Apple releases an iOS update. You could monitor the overall sentiment of a certain hashtag or account mentions, and visualize a line chart that demonstrates the change of your customer’s sentiment over time.
Another example, you could monitor your products through social media 24/7, and receive alerts when significant changes in sentiment happen to your product or service in a short amount of time. This would act as an early alert system to help you take quick, and appropriate action before a problem gets even bigger. An example would be a brand like Comcast or Time Warner wanting to keep tabs on customer satisfaction through social media, and proactively respond to customers when there is a service interruption.
Understanding Social Sentiment Analysis
The new algorithm returns three individual sentiments: positive, neutral, and negative. Additionally, it returns one general overall (i.e. compound) sentiment. Each individual sentiment is scored between 0 and 1 according to their intensity. The compound sentiment of the text is given between -1 and 1, which is between absolute negative and absolute positive, respectively.
Based on your use case and application, you may want to only use a specific individual sentiment (i.e. wanting to see only negative tweets, ordered by intensity), or the compound, overall sentiment (i.e. understanding general consumer feelings and opinions). Having both types of sentiment gives you the freedom to build applications exactly as you need to.
The new and improved nlp/SocialSentimentAnalysis algorithm is definitely faster, and better at classifying the sentiments of social media texts. It allows you to build different kinds of applications due to it’s variety of sentiment types (individual or compound), whereas the old one only provided a overall sentiment with five discrete values, and is better reserved for larger bodies of text.
Did you build an awesome app that uses nlp/SocialSentimentAnalysis? Let us know on Twitter @algorithmia!
Bonus: Check out the source code for running the benchmarks yourself @github!
Join Algorithmia CEO Diego Oppenheimer at Data Day 2016 in Austin, Texas on Saturday, January 16th at 2pm in Conference Room #301.
During this 60-minute session, learn how Big Data is being transformed by artificial intelligence through algorithmic advances in computer vision, natural language processing, and machine learning. This talk will cover how algorithms are a crucial part in the next Big Data revolution, and how the Algorithm Economy is creating new opportunities are opening up for startups and large companies.
Data Day Texas 2016 is held at The AT&T Conference Center at The University of Texas.
It’s been an exciting and incredibly productive year here at Algorithmia. As we kickoff 2016, I want to look back at everything we accomplished in 2015.
First, I want to thank our community of more than 15,000 algorithm and application developers for their support – it has been an amazing experience. We’re honored by everybody that signed-up, used us in your applications, and added algorithms to the platform. We’re truly thankful for this unique community.
These are the major milestones we crossed in 2015:
Algorithmia leaves private beta and publicly launches, introducing a community built around the algorithm economy, where state-of-the-art algorithms are always live, and accessible by everyone. The algorithm marketplace launches with over 800 algorithms, 3,500 users, and an API for leveraging the building blocks of human intelligence in your apps.
How to Build Your Own Google
Algorithms on Algorithmia are like Legos, where they can be mixed and matched to explore the web algorithmically. In this demo, we use Algorithmia to implement PageRank, the algorithm Google was originally based on, to crawl a site, retrieve it as a graph, analyze the connectivity of the graph, and then analyze each node for its contents.
Using Artificial Intelligence to Detect Nudity
We combine artificial intelligence with University research to teach an algorithm to determine if there is nudity in an image. The result is our Nudity Detection algorithm, which uses a combination of nose, face, and skin color detection algorithms to identify if there are people in an image, and if any of them are nude.
Don’t believe us? Try out the demo at isitnude.com.
Navigate Product Hunt Like a Pro
We build a Chrome Extension that uses FP-Growth and Keyword Set Similarity algorithms to surface related products on Product Hunt from users who’ve upvoted the product you’re browsing (i.e. collaborative filtering). Get the Chrome Extension here, and start discovering better products on Product Hunt today.
AWS Lambda Partnership
Algorithmia teams up with AWS Lambda, enabling developers to build intelligent, serverless apps in minutes by leveraging our built-in Node.js blueprint. In this demo, we show you how to quickly make a serverless photo app to create digital art in less than 300 lines of code using AWS Lambda and Algorithmia.
The #1 Big Data Startup
Supporting the Innovators of Tomorrow
Algorithmia joins more than 600 student hackers at DubHacks, the largest collegiate hackathon in the Pacific Northwest, as sponsor and participant to help teams build projects, and create solutions to real-world problems.
Content Recommendations Made Simple
We launch Algorithmia Recommends, a free content recommendation plugin for any WordPress or Drupal blog, or website to increase their engagement and retention. Algorithmia Recommends is built on the Breadth First Site Map web crawler algorithm, and powerful natural language processing algorithms Keywords For Document Set and Keyword Set Similarity to find and categorize all the pages on your website in order to help your users find content that’s most relevant to their interests.
Free eBook Published
Algorithmia launches its first eBook, Five Algorithms Every Web Developer Can Use and Understand, which teaches you how to harness the power of algorithms so you can make every app a smart app. In this short primer, we cover PageRank, Language Detection, Nudity Detection, Sentiment Analysis, and TF-IDF, and how you could implement it today.
GeekWire: Algorithmia Top Seattle Startup
Algorithmia joins GeekWire’s prestigious, annual list of the 10 most promising early-stage startups in the Seattle area. Our business model gets translated onto a giant six-foot by six-foot cocktail napkin, which is unveiled during GeekWire Gala at the Museum of History & Industry (MOHAI).
One of the 10 Coolest Big Data Startups
Algorithmia caps off an incredible year with another award, this time CRN names us one of the coolest big data startups.
This could have not been achieved without our amazing team Anthony, Besir, John, Jonathan, Kenny, Liz, Matt, Patrick and Zeynep. As well as our awesome 2015 interns Ahmad and Mark.
We’re just getting started, and 2016 is shaping up to be even bigger as we add more algorithms and use cases.
We’re looking forward to another incredible year, and we’re excited to have you along for the journey.
– Diego M. Oppenheimer, CEO
The Algorithmia Shorties contest is designed to help programmers of all skill levels get started with Natural Language Processing tools and concepts. We are big fans of NaNoGenMo here at Algorithmia, even producing our own NaNoGenMo entry this year, so we thought we’d replicate the fun by creating this generative short story competition!
We’ll be giving away $300 USD for the top generative short story entry!
Additionally there will be two $100 Honorable Mention prizes for outstanding entries. We’ll also highlight the winners and some of our favorite entries on the Algorithmia blog.
We’re pretty fast and loose with what constitutes a short story. You can define what the “story” part of your project is, whether that means your story is completely original, a modified copy of another book, a collection of tweets woven into a story, or just a non-nonsensical collection of words! The minimum requirements are that your story is primarily in English and no more than 7,500 words.
Each story will be evaluated with the following rubric:
- Creative use of the Algorithmia API
We’ll read though all the entries and grab the top 20. The top 20 stories will be sent to two Seattle school teachers for some old-school red ink grading before the final winner selection.
The contest runs from December 9th to January 9th. Your submission must entered before midnight PST on January 9th. Winners will be announced on January 13th.