Winners of the Algorithmia Shorties Contest Announced

We’re excited to announce the winners of the first-ever Algorithmia Shorties contest. Inspired by NaNoGenMo, the Algorithmia Shorties were designed to help programmers of all skill levels learn more about natural language processing. We challenged developers to get started with NLP by creating computer-generated works of short story fiction using only algorithms. Entries were judged by their originality, readability, and creative use of the Algorithmia platform.

Grand Prize Winner: Big Dummy’s Leaves of Spam

Algorithmia Shorties - Big Dummy's Leave of Spam

Our winner of the Algorithmia Shorties is Big Dummy’s Leaves of Spam by Skwak. We loved this entry because it creatively mashed up the classic poem Leaves of Grass by Walt Whitman, and Big Dummy’s Guide to the Internet, a sort of layperson’s guide to the internet, with “select works” from her Gmail Spam folder. The result is a highly readable, and often hilarious poem about the philosophy of life, the Internet, humanity, and scams.

Skwak used Sentence SplitGenerate Paragraph From Trigram, and Generate Trigram Frequencies from Algorithmia. In addition, she created an Express app and hosted it on Heroku. Here Github repo can be found here.

Read Big Dummy’s Leaves of Spam here. 

Honorable Mention: n+7

Our first honorable mention is n+7 by fabianekc. It was inspired by a class in Mathematics & Literature he took in 2001. He learned about the “n+7 method,” which is where you replace each noun in a text with the noun in a dictionary seven places after the original. For example, “a man meets a woman” is transformed to “a mandrake meets a wonder” using the Langenscheidt dictionary. The n+7 entry features the first four sections of Flatland by E. Abbott translated as a corpus. What we loved about this entry was that fabianekc created an algorithm on Algorithmia to accomplish the n+7 method. The algorithm takes either a link to a text file for processing, a pre-processed text file, or plain text, and replaces each noun with the noun in the dictionary seven places away. You can also change the dictionary, the offset (how many places after the original noun to use), and the start and end of the text to begin the replacement. Check out the Github repo here.

Read n+7 here.

Honorable Mention: Arrested Development Episode

Our second honorable mention goes to a computer-generated Arrested Development episode, created by peap and jennaplusplus. The duo generated trigrams using the scripts from the first three seasons of Arrested Development for all characters that had more than 10 lines overall. This created an eye-popping 71 trigram files! To create the faux episode script, they started the episode off with the Narrator speaking, and then randomly selected which characters would speak based on the size of their trigram file, ensuring no character spoke twice in a row, which resulted in every character having 1-5 lines every time they spoke. Check out their Github repo here.

Read Arrested Development Episode here.

How The Algorithmia Shorties Were Generated

For algorithmically-generated short stories, users start by finding a corpus of text available in the public domain through Feedbooks or Project Gutenberg. Other interesting corpora users could use are TV scripts, software license agreements, personal journals, public speeches, or Wikipedia articles. Users then generate trigrams, which are a collection of all the three-word sequences in the corpus. With trigrams generated, the last step is to reassemble the trigrams into sentences using Random Text From Trigrams, or Generate Paragraph From Trigrams.

Want to learn more? For a detailed walkthrough, check out our trigram tutorial here.


Benchmarking Sentiment Analysis Algorithms

Sentiment Analysis, also known as opinion mining, is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Sentiment analysis is often used to understand the opinion or attitude in tweets, status updates, movie/music/television reviews, chats, emails, comments, and more. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products.

Want to learn more about Sentiment Analysis? Read the Algorithmia Guide to Sentiment Analysis.

For example, sentiment analysis can be used to better understand how your customers perceive your brand or product on Twitter in real-time. Instead of going through hundreds, or maybe thousands, of tweets by hand, you can easily group positive, neutral, and negative tweets together, and sort them based on their confidence values. This will save you countless hours and leads to actionable insights.

Sentiment Analysis Algorithms On Algorithmia

Anyone who has worked with sentiment analysis can tell you it’s a very hard problem. It’s not hard to find trained models for classifying very specific domains, but there currently isn’t an all-purpose sentiment analyzer in the wild. In general, sentiment analysis becomes even harder when the volume of text increases, due to the complex relations between words and phrases.

On the Algorithmia marketplace, the most popular sentiment analysis algorithm is nlp/SentimentAnalysis. It’s an algorithm based on the popular and well known NLP library called Stanford CoreNLP, and it does a good job of analyzing large bodies of text. However, we’ve observed that the algorithm tends to return overly negative sentiment on short bodies of text, and decided that it needed some improvement.

We’ve found at that the Stanford CoreNLP library was originally intended for building NLP pipelines in a fast and simple fashion, and didn’t focus too much on individual features, and lacks documentation about how to retrain the model. Additionally, the library doesn’t provide confidence values, but rather returns an integer between 0 and 4.

A Better Alternative

We decided to test out a few alternative open-source libraries out there that would hopefully outperform the current algorithm when it came to social media sentiment analysis. We came across an interesting and quite well performing library called Vader Sentiment Analysis. The library is based on a paper published by SocialAI at Georgia Tech. This new algorithm performed exceptionally well, and has been added it to the marketplace. It’s called nlp/SocialSentimentAnalysis, and is designed to analyze social media texts.

We ran benchmarks on both nlp/SentimentAnalysis and nlp/SocialSentimentAnalysis to compare them to each other. We used an open dataset from the Crowdflowers Data for Everyone initiative. The Apple Computers Twitter sentiment dataset was selected because the tweets covered a wide array of topics. Real life data is almost never homogeneous, since it almost always has noise in it. Using this dataset helped us better understand how the algorithm performed when faced with real data.

We removed tweets that had no sentiment, and then filtered out anything that didn’t return 100% confidence so that the tweets were grouped and labeled tweets by consensus. This decreased the size of the dataset from ~4000 tweets to ~1800 tweets.

Running Time Comparison

The first comparison we made was the running time for each algorithm. The new nlp/SocialSentimentAnalysis algorithm runs up to 12 times faster than the nlp/SentimentAnalysis algorithm.


Accuracy Comparison

As you can see in the bar chart below, the new social sentiment analysis algorithm performs 15% better in overall accuracy.


We can also see how well it performs in accurately predicting each specific label. We used the One-Versus-All method to calculate the accuracy for every individual label. The new algorithm outperformed the old one in every given label.


When To Use Social Sentiment Analysis

The new algorithm works well with social media texts, as well as texts that inherently have a similar structure and nature (i.e. status updates, chat messages, comments, etc). The algorithm can still give you sentiments for bigger texts such as reviews, or articles, but it will probably be not as accurate as it is with social media texts.

An example application would be to monitor social media for how people are reacting to a change to your product, such as when Foursquare split their app in two: Swarm and Foursquare. Or, when Apple releases an iOS update. You could monitor the overall sentiment of a certain hashtag or account mentions, and visualize a line chart that demonstrates the change of your customer’s sentiment over time.

Another example, you could monitor your products through social media 24/7, and receive alerts when significant changes in sentiment happen to your product or service in a short amount of time. This would act as an early alert system to help you take quick, and appropriate action before a problem gets even bigger. An example would be a brand like Comcast or Time Warner wanting to keep tabs on customer satisfaction through social media, and proactively respond to customers when there is a service interruption.

Understanding Social Sentiment Analysis

The new algorithm returns three individual sentiments: positive, neutral, and negative. Additionally, it returns one general overall (i.e. compound) sentiment. Each individual sentiment is scored between 0 and 1 according to their intensity. The compound sentiment of the text is given between -1 and 1, which is between absolute negative and absolute positive, respectively.

Based on your use case and application, you may want to only use a specific individual sentiment (i.e. wanting to see only negative tweets, ordered by intensity), or the compound, overall sentiment (i.e. understanding general consumer feelings and opinions). Having both types of sentiment gives you the freedom to build applications exactly as you need to.


The new and improved nlp/SocialSentimentAnalysis algorithm is definitely faster, and better at classifying the sentiments of social media texts. It allows you to build different kinds of applications due to it’s variety of sentiment types (individual or compound), whereas the old one only provided a overall sentiment with five discrete values, and is better reserved for larger bodies of text.

Did you build an awesome app that uses nlp/SocialSentimentAnalysis? Let us know on Twitter @algorithmia!

Bonus: Check out the source code for running the benchmarks yourself @github!

Data Day 2016: Algorithm Marketplaces and the New “Algorithm Economy”

Algorithm Economy

Join Algorithmia CEO Diego Oppenheimer at Data Day 2016 in Austin, Texas on Saturday, January 16th at 2pm in Conference Room #301.

During this 60-minute session, learn how Big Data is being transformed by artificial intelligence through algorithmic advances in computer vision, natural language processing, and machine learning. This talk will cover how algorithms are a crucial part in the next Big Data revolution, and how the Algorithm Economy is creating new opportunities are opening up for startups and large companies.

Data Day Texas 2016 is held at The AT&T Conference Center at The University of Texas.

A Year In Review: A Letter From Algorithmia CEO Diego Oppenheimer

Diego Oppenheimer, CEO AlgorithmiaIt’s been an exciting and incredibly productive year here at Algorithmia. As we kickoff 2016, I want to look back at everything we accomplished in 2015.

First, I want to thank our community of more than 15,000 algorithm and application developers for their support – it has been an amazing experience. We’re honored by everybody that signed-up, used us in your applications, and added algorithms to the platform. We’re truly thankful for this unique community.

These are the major milestones we crossed in 2015:

Algorithmia Algorithms


Algorithmia Launches

Algorithmia leaves private beta and publicly launches, introducing a community built around the algorithm economy, where state-of-the-art algorithms are always live, and accessible by everyone. The algorithm marketplace launches with over 800 algorithms, 3,500 users, and an API for leveraging the building blocks of human intelligence in your apps.

Algorithmia Algorithms


How to Build Your Own Google

Algorithms on Algorithmia are like Legos, where they can be mixed and matched to explore the web algorithmically. In this demo, we use Algorithmia to implement PageRank, the algorithm Google was originally based on, to crawl a site, retrieve it as a graph, analyze the connectivity of the graph, and then analyze each node for its contents.

Algorithmia Algorithms


Using Artificial Intelligence to Detect Nudity

We combine artificial intelligence with University research to teach an algorithm to determine if there is nudity in an image. The result is our Nudity Detection algorithm, which uses a combination of nose, face, and skin color detection algorithms to identify if there are people in an image, and if any of them are nude.

Don’t believe us? Try out the demo at

Algorithmia Algorithms


Navigate Product Hunt Like a Pro

We build a Chrome Extension that uses FP-Growth and Keyword Set Similarity algorithms to surface related products on Product Hunt from users who’ve upvoted the product you’re browsing (i.e. collaborative filtering). Get the Chrome Extension here, and start discovering better products on Product Hunt today.

Algorithmia Algorithms


AWS Lambda Partnership

Algorithmia teams up with AWS Lambda, enabling developers to build intelligent, serverless apps in minutes by leveraging our built-in Node.js blueprint. In this demo, we show you how to quickly make a serverless photo app to create digital art in less than 300 lines of code using AWS Lambda and Algorithmia.

The #1 Big Data Startup

A panel of judges from Strata + Hadoop World select Algorithmia as the winner of the Startup Showcase based on the team, technology, and innovation from a pool of the 12 leading big data startups.

Algorithmia Algorithms


Supporting the Innovators of Tomorrow

Algorithmia joins more than 600 student hackers at DubHacks, the largest collegiate hackathon in the Pacific Northwest, as sponsor and participant to help teams build projects, and create solutions to real-world problems.

Content Recommendations Made Simple

We launch Algorithmia Recommends, a free content recommendation plugin for any WordPress or Drupal blog, or website to increase their engagement and retention. Algorithmia Recommends is built on the Breadth First Site Map web crawler algorithm, and powerful natural language processing algorithms Keywords For Document Set and Keyword Set Similarity to find and categorize all the pages on your website in order to help your users find content that’s most relevant to their interests.

Algorithmia Algorithms


Free eBook Published

Algorithmia launches its first eBook, Five Algorithms Every Web Developer Can Use and Understand, which teaches you how to harness the power of algorithms so you can make every app a smart app. In this short primer, we cover PageRank, Language Detection, Nudity Detection, Sentiment Analysis, and TF-IDF, and how you could implement it today.

Algorithmia Algorithms


GeekWire: Algorithmia Top Seattle Startup

Algorithmia joins GeekWire’s prestigious, annual list of the 10 most promising early-stage startups in the Seattle area. Our business model gets translated onto a giant six-foot by six-foot cocktail napkin, which is unveiled during GeekWire Gala at the Museum of History & Industry (MOHAI).

One of the 10 Coolest Big Data Startups

Algorithmia caps off an incredible year with another award, this time CRN names us one of the coolest big data startups.

Algorithmia Algorithms

On top of those highlights, we also overhauled the Algorithmia UI, introducing a new experience to help users explore algorithms, discover solutions, and help users get started faster. Additionally, we improved our documentation to make using Algorithmia a snap, as well as added clients for CLI, Java, JavaScript, Python, and Scala.

This could have not been achieved without our amazing team Anthony, Besir, John, Jonathan, Kenny, Liz, Matt, Patrick and Zeynep. As well as our awesome 2015 interns Ahmad and Mark.

We’re just getting started, and 2016 is shaping up to be even bigger as we add more algorithms and use cases.

We’re looking forward to another incredible year, and we’re excited to have you along for the journey.

– Diego M. Oppenheimer, CEO

The Algorithmia Shorties Contest

Generating short story fiction with algorithms

The Algorithmia Shorties contest is designed to help programmers of all skill levels get started with Natural Language Processing tools and concepts. We are big fans of NaNoGenMo here at Algorithmia, even producing our own NaNoGenMo entry this year, so we thought we’d replicate the fun by creating this generative short story competition!

The Prizes

We’ll be giving away $300 USD for the top generative short story entry!

Additionally there will be two $100 Honorable Mention prizes for outstanding entries. We’ll also highlight the winners and some of our favorite entries on the Algorithmia blog.

The Rules

We’re pretty fast and loose with what constitutes a short story. You can define what the “story” part of your project is, whether that means your story is completely original, a modified copy of another book, a collection of tweets woven into a story, or just a non-nonsensical collection of words! The minimum requirements are that your story is primarily in English and no more than 7,500 words.

Each story will be evaluated with the following rubric:

  • Originality
  • Readability
  • Creative use of the Algorithmia API

We’ll read though all the entries and grab the top 20. The top 20 stories will be sent to two Seattle school teachers for some old-school red ink grading before the final winner selection.

The contest runs from December 9th to January 9th. Your submission must entered before midnight PST on January 9th. Winners will be announced on January 13th.

How to generate a short story

Step One: Find a Corpus

Read More…

NaNoGenMo + Text Analysis with Algorithmia’s Natural Language Processing algorithms

We’ve just wrapped up November, which means aspiring writers all over the world are frantically typing away in an attempt to finish an entire novel in one month as part of National Novel Writing Month, also known as NaNoWriMo. Each November, participants aim to write 50,000 words on a 30 day deadline–a difficult feat for any writer! NaNoWriMo has been around for quite a long time, but for the last couple of years programmers and digital artists have been participating in a cheeky alternative: NaNoGenMo, or National Novel Generation Month.

Internet artist Darius Kazemi started NaNoGenMo after tweeting the idea in 2013:


This November is the third organized installment of NaNoGenMo and the community keeps growing every year as more and more programmers & artists become interested in the strange intersection of code, language processing, and literature. And because the event is primarily driven by developers, submissions are posted on a Github repo as Issues so that participants can comment on one another’s ideas and help each other create some of the most unique and sometimes nonsensical novels written in November.

In the NaNoGenMo world, “novel” is pretty loosely defined. According to the rules,

“The “novel” is defined however you want. It could be 50,000 repetitions of the word “meow”. It could literally grab a random novel from Project Gutenberg. It doesn’t matter, as long as it’s 50k+ words.”

(And of course, someone did make that 50,000 word “meow” book in 2014!)

Novel generation can be much more complicated than it appears from the outside. Some books integrate with social media by pulling text from twitter to generate dialogue, others go down a recursive rabbithole, and some even generate graphic novels.

Algorithmia is home to a wide variety of algorithms that are a perfect fit for NaNoGenMo. Because I don’t have any background in natural language processing or computational linguistics, I found it was easy to combine algorithms that not only helped me generate my novel, but gave me insights on the texts I used as a basis.

I chose the texts I wanted to work with based on two things: availability in the public domain and to have an interesting author demographic. While there are tons of NaNoGenMo books out there that are based on other texts, I wanted to find a really unique set of texts to base my novel on. I also developed an interest in 19th century American literature after reading Uncle Tom’s Cabin when I was 12. Luckily for me, Project Gutenberg is home to many novels and autobiographies that fit this intersection of interests!

First step: compile a corpus of texts. I chose to go with two sets of 7 books to compare. The first set was composed of primarily slave and emanicpation narratives from Black female authors. While digging around in these texts, I realized that books as seemingly disparate as Little Women were published at the same time. Somehow I have never really thought about how such drastically different worlds were becoming exposed in what we now think of as classic American literature, so I decided it would be interested to compare. The second set of texts are all from white female authors and published around the mid-19th century.

Set one:

  • 1861 – Incidents in the Life of a Slave Girl by Harriet Jacobs
  • 1868 to 1888 (published in serial form) – Trial and Triumph by Frances Ellen Watkins Harper
  • 1868 to 1888 (published in serial form) – Sowing and Reaping: A Temperance Story by Frances Ellen Watkins Harper
  • 1868 to 1888 (published in serial form) – Minnie’s Sacrifice by Frances Ellen Watkins Harper
  • 1868 – Behind the Scenes by Elizabeth Keckley
  • 1891 – From the Darkness Cometh the Light, or, Struggles for Freedom by Lucy Delaney
  • 1892 – Iola Leroy, or Shadows Uplifted by Frances Ellen Watkins Harper

Set two:

  • 1845 – Woman in the Nineteenth Century by Margaret Fuller
  • 1852 – Uncle Tom’s Cabin by Harriet Beecher Stowe
  • 1854 – The Lamplighter by Maria S. Cummins
  • 1854 – Ruth Hall: A Domestic Tale of the Present Time by Fanny Fern (pen name of Sara Payson Willis)
  • 1860 – Rutledge by Miriam Coles Harris
  • 1868 – Little Women by Louisa May Alcott
  • 1869 to 1870 (published in serial form) – An Old Fashioned Girl by Louisa May Alcott
  • 1872 – What Katy Did by Susan Coolidge

Before I started generating my own novel based on these texts, I rolled up my sleeves and got to work on analyzing them. The Algorithmia platform is already full of many text analysis algorithms, so instead of getting lost in learning natural language processing from scratch, it was as simple as choosing an algorithm, passing in my texts, and comparing the results.

Haven’t read any of the books? Don’t worry! The first algorithm I ran on the texts was Summarizer. This algorithm is pretty straightforward to use–input text, get back key sentences and ranked keywords. Read the summaries of Set One and Set Two if you need a literary refresher!

Using the AutoTag algorithm, I set out to discover if there would be a difference in the topics we’d find between the two author demographics. The Autotag algorithm uses a variant of Latent Dirichlet allocation and returns a set of keywords that reprensent the topics in the text. I then took each of the topics returned by the algorithm and classified them into various categories or themes to see if we could find some common threads.


I had suspected that the second set of books would have more domestic related themes, but I was mostly unsurprised that there were no autotagged keywords about race or slavery in that set. Interestingly, specific names as keywords were fairly frequent in both sets, averaging 4.8 out of 8 topics for set one and 5.7 of the topics in set two.

While this algorithm gives us some interesting insights into our texts, it can’t tell us everything and sometimes it can even trick you. For example, I grew suspicious of Sowing & Reaping when the AutoTag algorithm returned that one of the topics was “romaine”. I suspected that this book did not in fact focus on a type of lettuce as a main topic. Since I hadn’t read this specific book, I looked it up–turned out to be the last name of a main character!

After running the AutoTag algorithm on my data sets, I decided it check out Sentiment Analysis. This algorithm uses text analysis, natural language processing, and computational linguistics to identify subjective information in text. It’s also known as opinion mining. The algorithm I used returns a rating of Very Negative, Negative, Neutral, Positive or Very Positive.

Here’s the breakdown of sentiment by book:

Set One Books Sentiment Set Two Books Sentiment
Incidents in the Life of a Slave Girl Negative Woman in the Nineteenth Century Negative
Trial and Triumph Negative Uncle Tom’s Cabin Negative
From the Darkness Cometh the Light Negative The Lamplighter Negative
Sowing and Reaping Negative Ruth Hall Neutral
Minnie’s Sacrifice Negative Rutledge Very Negative
Behind the Scenes Positive Little Women Negative
Iola Leroy Negative What Katy Did Negative

Unsurprisingly, 12 out of 14 of the books I analyzed were Negative or Very Negative. Rough times in the 19th century!

Next, I decided it might be interesting to see what popped up with Profanity Detection. While getting the data into the algorithm and writing the results back to a file was easy, it turns out that profanity detection requires a lot of double checking by hand. I knew that some words that came up were not really profane back then; words like “queer”, “pussy”, and “muff” were innocent in the context of these 19th century texts.

Interestingly, the frequency of racial profanity of the two data sets ended up being relatively similar:



Of course running the algorithm doesn’t give you the full picture since in our second set of data about 95% of the racial profanity came from one book: Uncle Tom’s Cabin. This is unsurprising since it’s the only work in our second set of books that was written by an aboloitionist. However, we still don’t quite get the full picture about profanity in these books: many of the words used were not considered slurs back then, and additionally, within the use of dialogue this kind of language takes on a different dimension. The thing we can learn from an algorithm such as Profanity Detection is that there is a very stark different in who these books focused on as main characters and what kind of world they lived in. Four of the seven books written by white authors had zero instances of these words.

Now, you’ve read though all this and you’ve seen the results from all these different algorithms, you might be thinking to yourself that you don’t know how to do natural language processing so maybe this will be something you put on a project list and try out later. The most amazing part of this project that I haven’t told you yet is this: every single one of the scripts that I wrote to do NLP and text analysis was under 30 lines of code.

Check out the script I wrote for running the AutoTag algorithm:

import Algorithmia
import os
import json

client = Algorithmia.client('my_api_key')
algo = client.algo('nlp/AutoTag/0.1.4')

rootdir = './clean_books/set_one/'
output_file = 'set_one_autotag_results.txt'
results = ''

for subdir, dirs, files in os.walk(rootdir):
for filename in files:
with open(rootdir + filename, 'r') as content_file:
input =
print "Autotagging " + filename
results += filename + "\n\n"
results += json.dumps(algo.pipe(input))
results += "\n\n"
with open(output_file, 'w') as f:


print "Done!"

After analyzing all my books, it was time to generate my novel to complete NaNoGenMo. This was so easy compared to the text analysis! Once again, with just simple API calls, I generated trigram models based on each set of books. I then made book previews based on each trigram model just to see if you could hear a difference in the books generated on these different demographics.

The book preview from Set One:

Dem young uns vil kill you dead than to see you. Well, you would be less unhappy marriages if labor were more women in the midst of her nice pudding, as there are no enemies to good old aunt, and confirm themselves in woods and gloomy clouds hung like graceful draperies. Talk about the streets of the ballot in his land, that those who have fitted their children?

Belle, and I live in such dingy, humble quarters. said Mrs. Underhill, from my own sorrow-darkened home, I did, that he had asked them. Do you remember the incident so well were given to Frederick Douglass contributed $200, besides lecturing for us. The President added: Man is a fair specimen of her negro blood in his friendship, but they may be an old woman entered her home with me? If the vessel had been. Reader, I felt humiliated enough.

The book preview from Set Two:

aw! Yes, said Miss Skinlin she hasn’t the first heir to the female figure. The waves dance bright and happy when I forgot to learn, before which she told me to read and study. My Uncle, with a commanding, What are you better than Kintuck.

It was useless to ask one last word I ran down a corridor as dark and narrow streets or the other.

No Oh, Earth! And no one interfered, and it was. but then strangers came so by letting out all fear and distress and doubts of the damned, as well as bodies. What word Can we not get. I don’t resent the sarcasm, and unsettled most of my observing her to rise. Fortunately the gate swinging in the recesses, chrysanthemums and Christmas roses bloomed as freshly as in her voice, what everybody finds in the streets so, for the best thing was insufferably disgusting and loathsome to me. I said a thing as leisure there.

The most interesting difference I found in the text generated from these different data sets was that the text from Set Two sounded much more formal. The first set of books, the ones written by Black authors, tended to have much more dialogue written in such a way as to let the reader hear the accents and dialects of the time. These words became part of the model to generate text, so as you can see in the first sentence of the Set One preview, the algorithm generated text that still makes a lot of sense even with words that are intended to showcase an accent.

In the end, I decided to create a trigram model based on both sets of text and use that to generate my full length novel. I didn’t have to do any fancy code, I merely made another API call to the Generate Trigram Frequencies algorithm, this time passing in the entirety of my data set. Then, to generate my novel, I wrote a quick script that calls into another algorithm: Generate Paragraph From Trigram. This algorithm uses the trained trigram model to generate paragraphs of text. Since NaNoGenMo requires the book to be at least 50,000 words, I simply wrote a loop that calls the Generate Paragraph algorithm until the total word count of the book reaches the goal:

import Algorithmia
import os
import re
from random import randint

client = Algorithmia.client('my_api_key')
text_from_trigram = client.algo('/lizmrush/GenerateParagraphFromTrigram')
trigrams_file = "data://.algo/ngram/GenerateTrigramFrequencies/temp/all-trigrams.txt"

book_title = 'full_book.txt'
book = ''
book_word_length = 50000

while len(re.findall(r'w+', book)) < book_word_length:
print "Generating new paragraph..."
input = [trigrams_file, "xxBeGiN142xx", "xxEnD142xx", (randint(1,9))]
new_paragraph = text_from_trigram.pipe(input)
book += new_paragraph
book += 'nn'
print "Updated word count:"
print len(re.findall(r'w+', book))

with open(book_title, 'w') as f:


print "Done!"
print "You book is now complete. Give " + book_title + " a read now!"

Even with extra new lines for readability, the code I needed to generate an entire novel with Algorithmia was still under 30 lines! And I ended up generating a really unique, interesting novel without getting lost in the highly technical parts of natural language generation. Now, the text isn’t perfect: sometimes the sentences don’t quite sound right and there isn’t really any sort of story arch, but for such simple code I think it’s pretty good! My favorite part about using 19th century texts as the data set was that sometimes you can’t tell if the generated text is hard to read because it’s generated and doesn’t make much sense or because it sounds so old-timey. My book includes the following gems that just might pass as human-written text:

I have said of human life when I saw the Ohio river, that you shall work.

Still, falsehood may be hearing you. She only ‘spects something. Them curls may make a noise you shall not.

You can read the book, or rather, attempt to read the book, online or you can download it from the repo.

It’s mindblowingly fast and simple to get the power of these algorithms into your hands once they are behind a simple API call. You can see all the other scripts I wrote in the GitHub repo for this project. If you browse around, you’ll see that each script is nearly identical. The only real changes I had to make were replacing the algorithm I was calling and what I named the files to write results to! The Algorithmia platform is an incredibly powerful tool. Instead of spending days, weeks, months learning how to code my own natural language processing and text analysis algorithms, I could just pop my data into a variety of algorithms with simple API calls. No sweat, just results.

A behind the scenes look at the making of our GeekWire ‘Seattle 10′ napkin

We mentioned previously that GeekWire and a panel of Seattle’s top startup leaders selected Algorithmia as one of the 10 most promising startups in Seattle – an incredible honor to say the least. As part of this award, we were asked to translate our business onto a giant six-foot by six-foot cocktail napkin, which will be unveiled tonight to the public at the annual GeekWire Gala at the Museum of History & Industry (MOHAI)

We’re pretty excited with our napkin, and wanted to offer a sneak peek to readers. The napkin was designed by our developer evangelist, Liz Rush, who also happens to enjoy cartooning in her spare time. 

The concept started as a sketch with the different types of personas (users) that benefit from Algorithmia, and a mission statement: “Algorithmia: Bringing together organizations, academics, researchers, hackers, and engineers to unlock the power of algorithms in an accessible, open marketplace.”


We wanted to include the Algorithmia binary tree logo, punch up the mission statement, and move the design more toward a printed circuit board look:


Getting better! Something was missing, though… one of us had “a ridiculous idea to write the text the way you’d call an algorithm in Python.”


Thankfully, somebody knows some Python around here. Our CTO Kenny Daniel straightened things out:


There we go. The final napkin turned out great. We’re most excited about how we communicate the idea that Algorithmia enables developers to create tomorrow’s smart applications today in a first-of-its-kind marketplace for algorithms, which unlocks the building blocks of human intelligence, and provides access to world class scientific research and artificial intelligence in five lines of code or less.



Click the video below for a time-lapse version of the cocktail napkin design process!

Oh, and we even created the algorithm Purpose, which takes an array of your users as a string, and returns the purpose for your organization. Try it for yourself here.

Get Started Building Intelligent, Serverless Apps Using AWS Lambda and Algorithmia

In this walkthrough, we’ll show you how to quickly make a serverless photo app that creates digital art pieces in less than 300 lines of code using AWS Lambda and Algorithmia. We’ll be using the Quadtree Art Generator algorithm to create our art, and push the new image to our S3 bucket automatically:


AWS Lambda is great, because you can run code without provisioning or managing servers. Similarly, Algorithmia let’s you tap into the power of the algorithm economy with just a single API call. Together you can quickly build and deploy serverless apps within minutes.

Ready? Okay, let’s get started.

Step 1: Create Accounts

You’ll need a free AWS account, as well as an Algorithmia account. We provide you with 10,000 credits to get started, which will be more than enough for this demo and beyond.

Step 2: Create Your S3 Bucket

Now we need to create your S3 bucket for this project.

Start by selecting S3 from the AWS dashboard.


Then select “Create Bucket” from the Actions menu. Give your bucket a unique name (remember: only lowercase names, and no spaces), and then select a region where you want this bucket hosted


Once your bucket is created, you want to create two folders: input, and output. The input folder is what Lambda will watch for new images. The images will get processed and then returned to the output folder.

Step 3: IAM Role Configuration

Before we can create a Lambda function, we need to first make an IAM execution role. IAM stands for “Identity and Access Management,” and is an AWS service that helps you control access.

First, go to the AWS IAM Roles page, and select “Create Role.”


Under “Select Role Type,” find and select AWS Lambda. Search for AWSLambdaExecute, and select that.

If you need it, find the complete AWS documentation for this step here.

Step 4: Create the Lambda Function

Create a Lambda function by going to the Services menu in the AWS console, and select Lambda from the list.


Hit the blue “Get Started Now” button. On the “Select Blueprint” page, scroll to the bottom and hit “Skip.”

Give your function a name, description, and set the runtime to Node.js.

Now, copy our SDK from this Gist, into the Lambda function code box below.


Replace ‘YOUR_API_KEY_HERE’ with your Algorithmia API key in the Gist. Your API key can be found on the dashboard of your Algorithmia account.


Let’s walk through the top half of the Gist so we can understand how this works. We first define which Algorithmia algorithm we want to use. In this case we’re using the Quadtree Art Generator:

var algo = "algo://besirkurtulmus/quadtree_art/0.1.x"; 
Then we grab the new image from our S3 bucket
var s3 = new AWS.S3();
var bucket = event.Records[0]
var key = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, " "));
var params = {Bucket: bucket, Key: key};
var signedUrl = s3.getSignedUrl('getObject', params);

We process the image, turning it into quadtree art, and upload the image back to our bucket.

var client = algorithmia(apiKey);
    client.algo(algo).pipe(signedUrl).then(function(output) {
        if(output.error) {
            // The algorithm returned an error
            console.log("Error: " + output.error.message);
            // We call context.succeed to avoid Lambda retries, for more information see: 
        } else {
            // Upload the result image to the bucket
            var outputKey = 'output/'+key.substring(key.lastIndexOf('/') + 1);
            var params = {Bucket: bucket, Key: outputKey, Body: output.get()};
            s3.upload(params, function(err, data) {
                if (err) {
                    console.log("Error uploading data: ", err);
                } else {
                    console.log("Successfully uploaded data to bucket");
                    context.succeed("Finished processing");

Got that? Okay, great.

When you’re ready, select the IAM role you created in Step 3. It is best to be on the safe side and adjust Timeout to be maximum, so 5 min, and hit “Next.“

Step 5: Configure Event Sources

Once your function is created, we need to setup the event for Lambda to respond to. Start by clicking the "Event Sources” tab on your functions detail page. Then select “Add event source.”

Select S3 from the event source type drop-down. Select the bucket you created in Step 2. The event type you want to select is “Object Created (All).” We also want to add a prefix to tell Lambda to watch for new images here. In this case, we’ll use the prefix “input/”. Hit submit and you’re done.

Congrats, your AWS Lambda + Algorithmia function is ready to go. Lambda will now listen for new events in your S3 bucket, and automatically pass those images to Algorithmia where they will get processed by the Quadtree Art Generator, and then added back to S3 in the /output folder.

Test this out by logging into your S3 bucket, and navigating to the input folder, and uploading an image. Then, navigate to the output folder, where you’ll have your own piece of digital quadtree art!

Here’s out founders Diego Oppenheimer, and Kenny Daniel before:


…and after quadtree art generation:


What’s Next

You now have a working prototype that uses AWS Lambda and Algorithmia. You could use this same workflow to easily detect and crop photos using Smart Thumbnail, transcribe videos using speech to text, or check images for nudity. Learn more about how to leverage Algorithmia and Lambda here.

In a follow-up guide, we’ll teach you how to create a simple Android photo sharing app for uploading photos to S3, where Lambda will pick them up and turn them into digital art for others to enjoy.

DubHacks Spotlight: intuiti0n Helps Find Seminal Research Papers in Any Field


Algorithmia was on-hand at the second-annual DubHacks hackathon last month, the largest collegiate hackathon in the Pacific Northwest. Over 600 student developers and designers flocked to the University of Washington in Seattle campus to form teams, build projects, and create solutions to real-world problems.

intuiti0n wanted to make the literature review process easier by building a service that finds important research papers across all fields of study. The team was comprised of Nirawit Jittipairoj, Alex Thompson, and Bryant Wong.

We spoke to Bryant Wong from the team, a senior at the University of Washington with a triple major (!) in mathematics, statistics, and computer science, about their intuiti0n hack.

What was the problem you were trying to solve?

“Two of the members of our team have been involved with academic research, which has the goal of trying to push the limits of human knowledge. However, in order to push the limits of human knowledge, you need know exactly what is in that field, which you do with a literature review. However, literature reviews are kind of a Catch-22 – you need to read the most important papers in a field, but because you don’t know what’s in the field, you don’t know what papers to read. As a result, literature reviews are often spent just hunting for papers that appear relevant, and then discarding most of them as they are often only tangentially related to your field. This makes the whole process tedious and extremely inefficient.”

How did you solve this problem?

“We devised an app that centered around extracting data from papers, and used them to generate topics to make targeted searches to find (other) papers. We were taking the abstract and title from a paper, running an NLP algorithm called Latent Dirichlet Analysis (LDA) on it to generate topics, then run those topics through Google Scholar, parsing the results with Beautiful Soup. The user could set a threshold for the number of papers they would like returned so that the algorithm does not run indefinitely. Our heuristic for judging the importance of a paper was not so good, as we used the number of papers that had cited this paper. Obviously this is not a good metric as there are many irrelevant papers that are cited, but we did not have a better concrete heuristic to judge by.”

How did you utilize Algorithmia in your project?

“We used Algorithmia as the backbone for our machine learning and topic generation, as we ran our data through one of the LDA algorithms available on Algorithmia to generate topics. This provided several advantages for us over implementing the algorithm ourselves: 

1) not having to implement a complicated algorithm

2) not having a powerful enough server to run the algorithm (as our local machines were not particularly powerful)

3) simple integration in our Python scripts. 

This was a no-brainer decision and allowed us to have a half-functioning product by the end of DubHacks.”