All posts in Blog Posts

How the Algorithm Economy and Containers are Changing the Way We Build and Deploy Apps Today

The algorithm economy creates a new value chain

In the age of Big Data, algorithms give companies a competitive advantage. Today’s most important technology companies all have algorithmic intelligence built into the core of their product: Google Search, Facebook News Feed, Amazon’s and Netflix’s recommendation engines.

“Data is inherently dumb,” Peter Sondergaard, senior vice president at Gartner and global head of Research, said in The Internet of Things Will Give Rise To The Algorithm Economy. “It doesn’t actually do anything unless you know how to use it.”

Google, Facebook, Amazon, Netflix and others have built both the systems needed to acquire a mountain of data (i.e. search history, engagement metrics, purchase history, etc), as well as the algorithms responsible for extracting actionable insights from that data. As a result, these companies are using algorithms to create value, and impact millions of people a day.

“Algorithms are where the real value lies,” Sondergaard said. “Algorithms define action.”

For many technology companies, they’ve done a good job of capturing data, but they’ve come up short on doing anything valuable with that data. Thankfully, there are two fundamental shifts happening in technology right now that are leading to the democratization of algorithmic intelligence, and changing the way we build and deploy smart apps today:

  1. The Algorithm Economy
  2. Containers

The confluence of the algorithm economy and containers creates a new value chain, where algorithms as a service can be discovered and made accessible to all developers through a simple REST API. Algorithms as containerized microservices ensure both interoperability and portability, allowing for code to be written in any programming language, and then seamlessly united across a single API.

By containerizing algorithms, we ensure that code is always “on,” and always available, as well as being able to auto-scale to meet the needs of the application, without ever having to configure, manage, or maintain servers and infrastructure. Containerized algorithms shorten the time for any development team to go from concept, to prototype, to production-ready app.

Algorithms running in containers as microservices is a strategy for companies looking to discover actionable insights in their data. This structure makes software development more agile and efficient. It reduces the infrastructure needed, and abstracts an application’s various functions into microservices to make the entire system more resilient.


The Algorithm Economy

Algorithm marketplaces and containers create microservices
The “algorithm economy” is a term established by Gartner to describe the next wave of innovation, where developers can produce, distribute, and commercialize their code. The algorithm economy is not about buying and selling complete apps, but rather functional, easy to integrate algorithms that enable developers to build smarter apps, quicker and cheaper than before.

Algorithms are the building blocks of any application. They provide the business logic needed to turn inputs into useful outputs. Similar to Lego blocks, algorithms can be stacked together in new and novel ways to manipulate data, extract key insights, and solve problems efficiently. The upshot is that these same algorithms are flexible, and easily reused and reconfigured to provide value in a variety of circumstances.

For example, we created a microservice at Algorithmia called Analyze Tweets, which searches Twitter for a keyword, determining the sentiment and LDA topics for each tweet that matches the search term. This microservice stacks our Retrieve Tweets With Keywords algorithm with our Social Sentiment Analysis and LDA algorithms to create a simple, plug-and-play utility.

The three underlying algorithms could just as easily be restacked to create a new use case. For instance, you could create an Analyze Hacker News microservice that uses the Scrape Hacker News and URL2Text algorithms to extract the text for the top HN posts. Then, you’d simply pass the text for each post to the Social Sentiment Analysis, and LDA algorithms to determine the sentiment and topics of all the top posts on HN.

The algorithm economy also allows for the commercialization of world class research that historically would have been published, but largely under-utilized. In the algorithm economy, this research is turned into functional, running code, and made available for others to use. The ability to produce, distribute, and discover algorithms fosters a community around algorithm development, where creators can interact with the app developers putting their research to work.

Algorithm marketplaces function as the global meeting place for researchers, engineers, and organizations to come together to make tomorrow’s apps today.


Containers

Putting algorithms in containers enables the algorithm economy

Containers are changing how developers build and deploy distributed applications. In particular, containers are a form of lightweight virtualization that can hold all the application logic, and run as an isolated process with all the dependencies, libraries, and configuration files bundled into a single package that runs in the cloud.

“Instead of making an application or a service the endpoint of a build, you’re building containers that wrap applications, services, and all their dependencies,” Simon Bisson at InfoWorld said in How Containers Change Everything. “Any time you make a change, you build a new container; and you test and deploy that container as a whole, not as an individual element.”

Containers create a reliable environment where software can run when moved from one environment to another, allowing developers to write code once, and run it in any environment with predictable results — all without having to provision servers or manage infrastructure.

This is a shot across the bow for large, monolithic code bases. “[Monoliths are] being replaced by microservices architectures, which decompose large applications – with all the functionality built-in – into smaller, purpose-driven services that communicate with each other through common REST APIs,” Lucas Carlson from InfoWorld said in 4 Ways Docker Fundamentally Changes Application Development.

The hallmark of microservice architectures is that the various functions of an app are unbundled into a series of decentralized modules, each organized around a specific business capability.

Martin Fowler, the co-author of the Agile Manifesto, describes microservices as “an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.”

By decoupling services from a monolith, each microservice becomes independently deployable, and acts as a smart endpoint of the API. “There is a bare minimum of centralized management of these services,” Fowler said in Microservices: A Definition of this New Architectural Term, “which may be written in different programming languages and use different data storage technologies.”

Similar to the algorithm economy, containers are like Legos for cloud-based application development. “This changes cloud development practices,” Carlson said, “by putting larger-scale architectures like those used at Facebook and Twitter within the reach of smaller development teams.”


tl;dr

  • The algorithm economy and containers are changing the way developers build and ship code.
  • The algorithm economy allows for the building blocks of algorithmic intelligence to be made accessible, and discoverable through marketplaces and communities.
  • Containerizing algorithms enables them to be packaged as microservices, making them accessible via an API, and hosted on scalable, serverless infrastructure in the cloud.

HackPoly Spotlight: Helping Hand Uses Facial Recognition To Automate Tasks

Students at the HackPoly hackathon

We joined over 500 student hackers at the annual HackPoly hackathon at Cal Poly Pomona last weekend to see what these up-and-coming technologists could develop in just 24 hours. Student developers, designers, and hardware enthusiasts came from all over Southern California to form teams and build innovative products that solve real-world problems using a variety of tools, including Algorithmia.

John Pham, Josh Bither, Kevin Dinh, and Elijah Marchese worked together as team Helping Hand, which focused on creating a platform to give users the ability to remotely automate tasks, such as opening a door based using facial recognition software. Check out the great promo video they made about their project:

We chatted with John Pham to get a closer look at what they built:

How did you build this hack, and what technologies did you use?
This hack uses a Raspberry Pi at its core, responsible for most of the computation done. We then used a master/slave setup involving the Pi and an Arduino Uno. We utilized Algorithmia’s, Microsoft’s, and Clarifai’s API to create a facial recognition technique for a multitude of potential applications. We repurposed a Logitech webcam a live feed to the Pi. We then can programmed the Arduino to activate a servo. We created a SQL database, filled with recognized faces, which the user could personalize. Notifications and logs can be viewed from the Android app, and by extension the Pebble watch.

What’s next for the Helping Hand team?
We all hope to continue refining Helping Hand in propelling the project forward to reach its full potential. As of now, we plan to improve the software aspect of Helping Hand, and then direct our attention on bettering the hardware. We fully intend to release our product to the public. Affordable and user-friendly, Helping Hand will provide the security and peace of mind we all want for our communities and our families.

For John, this was his fourth hackathon working on a completely new platform. Kevin also had prior hackathon experience, but was especially proud of all they were able to accomplish in just 24 hours. According to Kevin, the event was really stressful at the start, because the team hadn’t worked together before, but “it got better as it went along, and we got more acclimated to cooperating.”

HackPoly was Josh’s first hackathon, but like a seasoned hacker, he said “My favorite part of HackPoly were the very early morning hours, from 12 – 3 am, where trying to code coherently becomes nearly impossible. I got about 3-4 hours of sleep for the whole event.”

It was also Elijah’s first hackathon and he described the experience as “a wild rollercoaster ride.” Following the event, Elijah went on to explain that the hackathon was a high pressure, highly competitive endeavor: “There was a lot of stress trying to learn so much in so little time. Despite only having three hours of sleep, Helping Hand turned out great and the satisfaction of completing the project made me feel fulfilled.”

As the winners of the “Best Use of Algorithmia API” prize, we sent the team Cloudbit Starter Kits to help them continue on their path of hardware and Internet of Things hacking. We can’t wait to see what else the team builds and what happens next with Helping Hand!

More About HackPoly and Helping Hand:

Analyzing Tweets Using Social Sentiment Analysis

An Introduction to Social Sentiment and Analyzing Tweets

Sentiment Analysis, also known as opinion mining, is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Sentiment analysis is often used to understand the opinion or attitude in tweets, status updates, movie/music/television reviews, chats, emails, comments, and more. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products.

For example, sentiment analysis can be used to better understand how your customers on twitter perceive your brand online in real-time. Instead of going through hundreds, or maybe thousands, of tweets by hand, you can easily group positive, neutral, and negative tweets together, and sort them based on their confidence values. This can save you countless hours and leads to actionable insights.

Additionally, you can use another natural language processing algorithm called Latent Dirichlet allocation. This powerful algorithm can extract a mixture of topics for a given set of documents. LDA doesn’t give you the topic name, but gives you a very good idea about what the topic is. Each topic is represented by a bag of words, which tells you what the topic is about. As an example, if the bag of words are: “sun, nasa, earth, moon”, you can understand the topic is probably related to space.

You can use both of these powerful algorithms to build a tool to analyze tweets in real time. This tool will just give you just a glimpse of what can be built using just two machine learning algorithms and a scraper. Learn more about Sentiment Analysis with this handy Algorithmia Guide..

Using NLP from Algorithmia to Build an App For Analyzing Tweets on Demand

Analyze Tweets is a minimal demo from Algorithmia that searches Twitter by keyword and analyzes tweets for sentiment and LDA topics. Currently it searches up to 500 tweets that aren’t older than 7 days. It also caches each query up to 24 hours.

Currently the demo starts initializing with a random keyword immediately after this web page loads. The randomly selected keywords are: algorithmiamicrosoftamazongooglefacebook, and reddit.

You can check out the demo at demos.algorithmia.com/analyze-tweets/

How we built it

Before we can start, you need to have an Algorithmia account. You can create one on our website.

The first thing we did was to click the Add Algorithm button located under the user tab, which is located at the top right side of the webpage.

Screen Shot 2016-01-28 at 1.58.32 AM

We enabled the Special Permissions, so that our algorithm can call other algorithms and have access to the internet. We also selected Python as our preferred language.

Screen Shot 2016-01-28 at 2.01.55 AM

The first thing we decided was to pick which algorithms we were going to use. We came up with a recipe that uses 3 different algorithms in the following steps:

  1. It first checks if the query has been cached before and the cache isn’t older than 24 hours.
    1. If it has the appropriate cache, it returns the cached query.
    2. If it can’t find an appropriate cached copy, it continues to run the analysis.
      1. It retrieves up to 500 tweets by using the user provided keyword and twitter/RetrieveTweetsWithKeyword algorithm.
      2. It runs the nlp/SocialSentimentAnalysis algorithm to get sentiment scores for all tweets.
      3. It creates two new groups of tweets that are respectively the top 20% most positive and top 20% most negative tweets.
        1. We do this by sorting the list of tweets based on their overall sentiment provided by the previously mentioned algorithm.
        2. We then take the top 20% and bottom 20% to get the most positive and negative tweets.
        3. We also remove any tweets in both groups that have a sentiment of 0 (neutral). This is because the twitter retriever does don’t guarantee to return 500 tweets each time. For more information, click here.
      4. These two new groups (positive and negative) of tweets are fed into the nlp/LDA algorithm to extract positive and negative topics.
      5. All relevant information is first cached, then is returned.

Based on our recipe above, we decided to wrap this app into a micro-service and call it nlp/AnalyzeTweets. You can check out the source code here. You can copy and paste the code to your own algorithm if you’re feeling lazy. The only part you would need to change would be line 28 from:

cache_uri = "data://demo/AnalyzeTweets/" + str(query_hash) + ".json"

to:

cache_uri = "data://yourUserName/AnalyzeTweets/" + str(query_hash) + ".json"

And the last step, is to compile and publish the new algorithm.

Screen Shot 2016-01-28 at 2.46.26 AM

And… Congrats! By using different algorithms as building blocks, you now have a working micro-service on the Algorithmia platform!

Notes

After writing the algorithm, we built a client-side (JS + CSS + HTML) demo that calls our new algorithm with input from the user. In our client-side app, we used d3.js for rendering the pie chart and histogram. We also used DataTables for rendering interactive tabular data. SweetAlert was used for the stylish popup messages. You can find a standalone version of our Analyze Tweets app here.

A real world case study

Screen Shot 2016-01-28 at 2.48.05 AM

While we were building the demo app and writing this blog post (on Jan 28th), Github went down and was completely inaccessible. Being the dataphiles we are, we immediately ran an analysis on the keyword “github” using our new app.

As you can expect, developers on Twitter were not positive about GitHub being down. Nearly half of all tweets relating to GitHub at this time were negative, but to get a better idea of the types of reactions we’re seeing, we can look to the topics generated with LDA. We can see that Topic 4 probably represents a group of people who are more calm about the situation. The famous raging unicorn makes an appearance in multiple topics and we can also see some anger in the other topics. And of course, what analysis of developer tweets would be complete without people complaining about other people complaining? We can see this in Topic 2 with words like “whine”. See the full analysis during the downtime below:

screen

Winners of the Algorithmia Shorties Contest Announced

We’re excited to announce the winners of the first-ever Algorithmia Shorties contest. Inspired by NaNoGenMo, the Algorithmia Shorties were designed to help programmers of all skill levels learn more about natural language processing. We challenged developers to get started with NLP by creating computer-generated works of short story fiction using only algorithms. Entries were judged by their originality, readability, and creative use of the Algorithmia platform.

Grand Prize Winner: Big Dummy’s Leaves of Spam

Algorithmia Shorties - Big Dummy's Leave of Spam

Our winner of the Algorithmia Shorties is Big Dummy’s Leaves of Spam by Skwak. We loved this entry because it creatively mashed up the classic poem Leaves of Grass by Walt Whitman, and Big Dummy’s Guide to the Internet, a sort of layperson’s guide to the internet, with “select works” from her Gmail Spam folder. The result is a highly readable, and often hilarious poem about the philosophy of life, the Internet, humanity, and scams.

Skwak used Sentence SplitGenerate Paragraph From Trigram, and Generate Trigram Frequencies from Algorithmia. In addition, she created an Express app and hosted it on Heroku. Here Github repo can be found here.

Read Big Dummy’s Leaves of Spam here. 

Honorable Mention: n+7

Our first honorable mention is n+7 by fabianekc. It was inspired by a class in Mathematics & Literature he took in 2001. He learned about the “n+7 method,” which is where you replace each noun in a text with the noun in a dictionary seven places after the original. For example, “a man meets a woman” is transformed to “a mandrake meets a wonder” using the Langenscheidt dictionary. The n+7 entry features the first four sections of Flatland by E. Abbott translated as a corpus. What we loved about this entry was that fabianekc created an algorithm on Algorithmia to accomplish the n+7 method. The algorithm takes either a link to a text file for processing, a pre-processed text file, or plain text, and replaces each noun with the noun in the dictionary seven places away. You can also change the dictionary, the offset (how many places after the original noun to use), and the start and end of the text to begin the replacement. Check out the Github repo here.

Read n+7 here.

Honorable Mention: Arrested Development Episode

Our second honorable mention goes to a computer-generated Arrested Development episode, created by peap and jennaplusplus. The duo generated trigrams using the scripts from the first three seasons of Arrested Development for all characters that had more than 10 lines overall. This created an eye-popping 71 trigram files! To create the faux episode script, they started the episode off with the Narrator speaking, and then randomly selected which characters would speak based on the size of their trigram file, ensuring no character spoke twice in a row, which resulted in every character having 1-5 lines every time they spoke. Check out their Github repo here.

Read Arrested Development Episode here.

How The Algorithmia Shorties Were Generated

For algorithmically-generated short stories, users start by finding a corpus of text available in the public domain through Feedbooks or Project Gutenberg. Other interesting corpora users could use are TV scripts, software license agreements, personal journals, public speeches, or Wikipedia articles. Users then generate trigrams, which are a collection of all the three-word sequences in the corpus. With trigrams generated, the last step is to reassemble the trigrams into sentences using Random Text From Trigrams, or Generate Paragraph From Trigrams.

Want to learn more? For a detailed walkthrough, check out our trigram tutorial here.

 

Benchmarking Sentiment Analysis Algorithms

Sentiment Analysis, also known as opinion mining, is a powerful tool you can use to build smarter products. It’s a natural language processing algorithm that gives you a general idea about the positive, neutral, and negative sentiment of texts. Sentiment analysis is often used to understand the opinion or attitude in tweets, status updates, movie/music/television reviews, chats, emails, comments, and more. Social media monitoring apps and companies all rely on sentiment analysis and machine learning to assist them in gaining insights about mentions, brands, and products.

Want to learn more about Sentiment Analysis? Read the Algorithmia Guide to Sentiment Analysis.

For example, sentiment analysis can be used to better understand how your customers perceive your brand or product on Twitter in real-time. Instead of going through hundreds, or maybe thousands, of tweets by hand, you can easily group positive, neutral, and negative tweets together, and sort them based on their confidence values. This will save you countless hours and leads to actionable insights.

Sentiment Analysis Algorithms On Algorithmia

Anyone who has worked with sentiment analysis can tell you it’s a very hard problem. It’s not hard to find trained models for classifying very specific domains, but there currently isn’t an all-purpose sentiment analyzer in the wild. In general, sentiment analysis becomes even harder when the volume of text increases, due to the complex relations between words and phrases.

On the Algorithmia marketplace, the most popular sentiment analysis algorithm is nlp/SentimentAnalysis. It’s an algorithm based on the popular and well known NLP library called Stanford CoreNLP, and it does a good job of analyzing large bodies of text. However, we’ve observed that the algorithm tends to return overly negative sentiment on short bodies of text, and decided that it needed some improvement.

We’ve found at that the Stanford CoreNLP library was originally intended for building NLP pipelines in a fast and simple fashion, and didn’t focus too much on individual features, and lacks documentation about how to retrain the model. Additionally, the library doesn’t provide confidence values, but rather returns an integer between 0 and 4.

A Better Alternative

We decided to test out a few alternative open-source libraries out there that would hopefully outperform the current algorithm when it came to social media sentiment analysis. We came across an interesting and quite well performing library called Vader Sentiment Analysis. The library is based on a paper published by SocialAI at Georgia Tech. This new algorithm performed exceptionally well, and has been added it to the marketplace. It’s called nlp/SocialSentimentAnalysis, and is designed to analyze social media texts.

We ran benchmarks on both nlp/SentimentAnalysis and nlp/SocialSentimentAnalysis to compare them to each other. We used an open dataset from the Crowdflowers Data for Everyone initiative. The Apple Computers Twitter sentiment dataset was selected because the tweets covered a wide array of topics. Real life data is almost never homogeneous, since it almost always has noise in it. Using this dataset helped us better understand how the algorithm performed when faced with real data.

We removed tweets that had no sentiment, and then filtered out anything that didn’t return 100% confidence so that the tweets were grouped and labeled tweets by consensus. This decreased the size of the dataset from ~4000 tweets to ~1800 tweets.

Running Time Comparison

The first comparison we made was the running time for each algorithm. The new nlp/SocialSentimentAnalysis algorithm runs up to 12 times faster than the nlp/SentimentAnalysis algorithm.

fig01

Accuracy Comparison

As you can see in the bar chart below, the new social sentiment analysis algorithm performs 15% better in overall accuracy.

fig02

We can also see how well it performs in accurately predicting each specific label. We used the One-Versus-All method to calculate the accuracy for every individual label. The new algorithm outperformed the old one in every given label.

fig03

When To Use Social Sentiment Analysis

The new algorithm works well with social media texts, as well as texts that inherently have a similar structure and nature (i.e. status updates, chat messages, comments, etc). The algorithm can still give you sentiments for bigger texts such as reviews, or articles, but it will probably be not as accurate as it is with social media texts.

An example application would be to monitor social media for how people are reacting to a change to your product, such as when Foursquare split their app in two: Swarm and Foursquare. Or, when Apple releases an iOS update. You could monitor the overall sentiment of a certain hashtag or account mentions, and visualize a line chart that demonstrates the change of your customer’s sentiment over time.

Another example, you could monitor your products through social media 24/7, and receive alerts when significant changes in sentiment happen to your product or service in a short amount of time. This would act as an early alert system to help you take quick, and appropriate action before a problem gets even bigger. An example would be a brand like Comcast or Time Warner wanting to keep tabs on customer satisfaction through social media, and proactively respond to customers when there is a service interruption.

Understanding Social Sentiment Analysis

The new algorithm returns three individual sentiments: positive, neutral, and negative. Additionally, it returns one general overall (i.e. compound) sentiment. Each individual sentiment is scored between 0 and 1 according to their intensity. The compound sentiment of the text is given between -1 and 1, which is between absolute negative and absolute positive, respectively.

Based on your use case and application, you may want to only use a specific individual sentiment (i.e. wanting to see only negative tweets, ordered by intensity), or the compound, overall sentiment (i.e. understanding general consumer feelings and opinions). Having both types of sentiment gives you the freedom to build applications exactly as you need to.

Conclusion

The new and improved nlp/SocialSentimentAnalysis algorithm is definitely faster, and better at classifying the sentiments of social media texts. It allows you to build different kinds of applications due to it’s variety of sentiment types (individual or compound), whereas the old one only provided a overall sentiment with five discrete values, and is better reserved for larger bodies of text.

Did you build an awesome app that uses nlp/SocialSentimentAnalysis? Let us know on Twitter @algorithmia!

Bonus: Check out the source code for running the benchmarks yourself @github!