Algorithmia Blog - Deploying AI at scale

Tracking Economic Development with Open Data and Predictive Algorithms

The world runs on data, but all too often, the effort to acquire fresh data, analyze it and deploy a live analysis or model can be a challenge. We here at Algorithmia spend most of our waking hours thinking about how to help you with the latter problem, and our friends at Socrata spend their time thinking about how to make data more available. So, we combined forces to analyze municipal building permit data (using various time series algorithms), as a reasonable proxy for economic development. Thanks to Socrata, this information is now available (and automatically updated) for many cities, with no need to navigate any bureaucracy. So we pulled in data for a range of cities, including:

  • Santa Rosa, CA
  • Fort Worth, TX
  • Boston, MA
  • Edmonton, Alberta, Canada
  • Los Angeles, CA
  • New York City, NY

We will show that it is fairly easy to relate building permit data to local (seasonal) weather patterns and economic conditions. When evaluating a given area, it is useful to know how much development has been going on, but also how much of this is explainable by other factors, such as seasonal variation, macroeconomic indicators, commodity prices,  etc. This can help us answer such questions as: “how is this city likely be affected by a recession?” or “how will real estate development fare if oil prices drop?”

Tools Used

Try it out here:

Source Data

We use an algorithm to directly access Socrata’s API that can be found here.

Once we’ve retrieved the data, we aggregate it by total number of permits issued per month, and plot this data. To make the graph clearer, it sometimes helps to smooth or denoise the data. There are a number of ways to do this, with moving averages being the most popular. The most intuitive is the Simple Moving Average, which replaces each point by the average of itself and some number of the preceding points (default is 3).

simple moving average

A simple first inspection of data reveals trends and large peaks/dips. Ft. Worth, TX, lends itself to this analysis. It shows a steady growth from the beginning of the record up to a peak right before the subprime crisis, with a fairly rapid fall off to a plateau.

Seasonality

As much as we can learn from simple inspection of the plot, there are factors making this difficult, especially when it comes to inferring economic activity. For instance, outdoor construction tends to take place during nicer weather, especially in cold places far from the equator. This often makes the data harder to interpret – a seasonal spike doesn’t say nearly as much about underlying economic activity as a non-seasonal one, so we need a way to take this into account.

Edmonton, Alberta, is a particularly clear example of this.

seasonality

To address seasonality, we first need to remove the linear trends via Linear Detrend, which fits a line to the data and then subtracts it from the data, resulting in a time series of differentials from the removed trend. Without detrending, the following seasonality analysis will not work.

Permit issuance in Edmonton is clearly seasonal, even to the naked eye, but seasonality can be detected even in much noisier data using autocorrelation. Very roughly speaking, you can identify seasonality by the relative strength and definition of peaks in the autocorrelation plot of the time series, which you can calculate using our Autocorrelation algorithm. The autocorrelation of a time series is a rough measure of how similar a time series is to a lagged version of itself. If a series is highly seasonal according to some period of length k, the series will look similar to itself about every k steps between peaks and troughs.

We expect the seasonality to be strongest furthest from the equator. Sure enough, the seasonality is most clear in more northern cities such as NYC and Edmonton and least clear in cities like Los Angeles – you can check this yourself with the interactive panel above.

It can help to suppress this seasonal influence so the effects of other factors will be made more clear. In Algorithmia this can be done using an algorithm called Remove Seasonality, which, by default, detects and removes the strongest seasonal period. When we do this it smooths out seasonal variation, and what it leaves is, in a sense, what is unexpected in the data.

removed seasonality

Note that the transformed time point should be interpreted as the difference of the actual value of a given point in time, minus the expected seasonal component of the data. It is a differential rather than an absolute value and thus can be negative.

Anomaly Detection

Once we’ve accounted for linear trends and seasonality, we’re left with the most mysterious part of the data, where the more subtle signals hide. In many cases visual inspection and some detective work will be useful here, but in other cases it’s useful to automate the process of detecting subtle behavior. For instance, if you have an automated process that ingests, analyzes, and reports on incoming data, you may want to be alerted if something changes drastically, and take action based on that information. Defining what constitutes “abnormal” can be a hard problem, but by removing the larger trends, we are left with a clearer picture of these abnormalities.

The city of Santa Rosa, CA provides an interesting example. One can see from inspection the sudden spike in permits issued in 2013, but it can also be detected automatically using Outlier Detection algorithms. This algorithm works by setting all non-outliers to zero and leaving outliers unchanged, where an outlier is defined as any data point that falls more than two standard deviations from the mean of the series. We don’t have an explanation for this particular outlier, it may just be chance or a change in reporting, but does indicate that it may be worth looking into more carefully.

outlier detection

Forecasting

At this point it would be nice to tell you about our shiny crystal ball that can predict the future (and return it via an API call). Unfortunately, we’re not there quite yet, but we CAN help you see what observed trends will look like extrapolated into the future. Specifically, the Forecast algorithm takes your time series and projects observed linear and seasonal trends out a given number of time steps into the future. It does this by calculating, for each future point, its value according to extrapolation of the detected linear trend, then adding to that the differential corresponding to the contribution of each seasonal component to that point. In the demo, Forecast is set to predict using data from the 3 strongest seasonal components.

Judging from the results of the Forecast algorithm, we expect a steady increase in construction activity for Edmonton, as well as Los Angeles, Boston, and New York. Santa Rosa looks to maintain the status quo, and Forth Worth looks to have a slight downward trend.

forecast

All together now!

The takeaway from this example is not the individual pieces – most of these techniques are available a number of places. The secret here is that it is simple, composable, and fully automated. Once one person writes a data connector in Algorithmia (which is likely to happen for interesting public data, like that provided by Socrata), you don’t have to burn precious time writing and working kinks out of your own. On the other hand, if something existing gets done better, the compositional nature of Algorithmia will allow you to swap out pieces of a pipeline seamlessly. When someone comes up with a better outlier detector, then upgrading a pipeline like the one above can be literally just a few keystrokes away.

Finally, the analyses we’ve shown here are based on live data, changing all the time as new data flows in from Socrata’s pipeline to each individual city’s building permit system; Algorithmia offers an easy way to deploy and host that live analysis so the underlying data and interpretation are always up to date.

Offloading Non-core Feature Development with Algorithmia

Today’s blog post is brought to you by one of our customers DeansList software. Thank you Matt for sharing your experience with our community. 

At DeansList, one of the things that we do really well is build custom report cards that students take home (or get e–mailed) to their parents. We build them from the ground up with every school – taking instructions on everything from design to data points to the placement of charts. They often go home every week and, in addition to keeping parents up–to–speed, they include them in the messaging, programs and structures that engage their children every day.

The Challenge

image

Ugly, but functional.

Not surprisingly, with the promise of customization come very unique requests. Lots of our schools include fake paychecks linked to a school’s token economy – a tool to teach financial literacy. Additionally, many schools we work with serve large populations of families whose primary language is not English (often Spanish). Whenever possible we translate our reports into a second language on the back, so non–English speaking parents aren’t missing any important details. The challenge we faced was translating integers into Spanish words for the second line on the check (i.e. $430 into cuatrocientos treinta dólares). We wrote out a script to translate numbers to English words, but no one on our team has the language expertise to do the same in Spanish. It admittedly wasn’t a huge priority so we kept the words in English – ugly but functional.

Around the time we were tackling this we came across Algorithmia and it’s Bounty feature. It seemed like a longshot – but what’s the harm in asking? So we posted a request for someone to write an algorithm to Translate Integers to Spanish Words.

A Bounty Fulfilled

To be honest, after I created the bounty, I forgot about until March 2nd when I got the “Your bounty has been completed” e–mail. A user named Javier had uploaded Cardinales. I logged right in, threw some tests into the web console and, within diez minutos, was hitting the API successfully via Postman.

Implementation

image

The Algorithmia-driven solution!

Algorithmia’s libraries plug right into any platform, and we considered using their JS on the front–end. However, for a few reasons, wrapping an endpoint to our internal API around a call to Cardinales made the most sense. First, this feature will likely get deployed again in many more schools – so abstracting it gives us more flexibility to change things around as needed. Secondly, school firewalls can be incredibly restrictive, so having the browser request the data via our server keeps all our calls in the whitelist and eliminates troubleshooting hard–to–pinpoint issues.

Safe and Secure

Since we work with student data, we have to be super careful about how and who we connect with. Our privacy policy strictly prohibits sharing student data with third–party software providers, and a feature like this certainly wouldn’t warrant an exemption. Also, we definitely wouldn’t consider bringing in any sort of compiled code for a small function like this.

What’s awesome about Algothmia is it’s not inside our system, and we don’t need to provide Cardinales any kind of student data to work. We just send it a # – 354, and it comes back to us with the translation: trescientos cincuenta y cuatro. Everything happens via cURL and there’s no trace of Algorithmia or Cardinales code on our servers.

Considerations for Next Time
Or, How to Write a Better Bounty Spec…

Javier went above–and–beyond and included things in the solution that we didn’t ask for, like translations in both the masculine and the feminine, and proper translations for negative numbers. Still, now that it’s up and running, I realize there are a bunch of things we could have thought more about when we wrote the spec.

  • Multiple items in a single request – Right now, we send one number at a time, and receive one translation per request. This means there’s a lot of overhead if we print 100 students reports at once. Next time around, I’ll include the ability to send an array of inputs, and receive key–value pairs as a return.
  • Error codes – Cardinales handles bad inputs gracefully. Sending “abcd” returns empty quotes. However, more elaborate algorithms might require more detailed error reporting. It’s definitely something we’ll keep in mind going forward.

What Next?

As more schools build savvy data teams, we’re always looking for ways to help them integrate their efforts with our own. We have a public API, but I could also see Algorithmia as an easy, cost–effective way for them to contribute their own highly–specific code without having to think about setting up their own infrastructure.

So far, leveraging Algorithmia for a non–core feature like this has been an easy, awesome experience. Our engineers were stoked to plug this in and so far it works perfectly. I’ve never met Javier, but I can read his code (it’s elegant), he’s contributed meaningfully to our platform, and I appreciate his work. Gracias!

Matt Robins is the cofounder of DeansList, a platform that manages non-academic student data. DeansList’s platform puts behavior data to work, driving actionable reporting for students, teachers administrators and parents. For more information on DeansList, or to ask Matt questions about his Algorithmia implementation, e–mail him at matt@deanslistsoftware.com.

Algorithmia Security Bounty Program

For us here at Algorithmia, protecting the privacy and security of our user’s information is a top priority. After some time in development, we are happy to announce that starting today we will be recognizing security researchers for their efforts through a bug bounty program..

A bug bounty program is common practice amongst leading companies to improve the security and experience of their products. This type of program provides an incentive for security researchers to responsibly disclose vulnerabilities and bugs, and allows for internal security teams to respond adequately in the best interest of their users.

All vulnerabilities should be reported via security@algorithmia.com. GPG key available below [1].

Guidelines

We require that all researchers:

 

Make every effort to avoid privacy violations, degradation of user experience, disruption to production systems, and destruction of data during security testing:

  • Use the designated communication channels to report vulnerability information to us; and
  • Keep information about any vulnerabilities you’ve discovered confidential between yourself and Algorithmia until we’ve had 90 days to resolve the issue.

If you follow these guidelines when reporting an issue to us we commit to:

 

  • Not institute a civil legal action against you and not support a criminal investigation;
  • Work with you to understand and resolve the issue quickly (confirming the report within 72 hours of submission);
  • Recognize your contribution on our site, if you are the first to report the issue and we make a code or configuration change based on the issue.

Scope:

 

Any component developed by us under Algorithmia.com is fair game for this bounty system except individual algorithms created by our users.

Out of Scope:

Any services hosted by 3rd party providers and services are excluded from scope.

 

In the interest of the safety of our users, staff, the Internet at large, and you as the security researcher, the following test types are excluded from scope and not eligible for a reward:

  • Findings from physical testing such as office access (e.g. open doors, tailgating)
  • Findings derived primarily from social engineering (e.g. phishing)
  • Findings from applications or systems not listed in the ‘Targets’ section
  • Functional, UI and UX bugs and spelling mistakes
  • Network level Denial of Service (DoS/DDoS) vulnerabilities

Things we do not want to see:

Personally identifiable information of users (PII) that you may have found during your research.

[1]

—–BEGIN PGP PUBLIC KEY BLOCK—–

Version: GnuPG v1

mQENBFU8HlYBCACm2TJ4/kaP10G/dpfeLlzWCsJ1zrbehUg/G3mzzN2vSXKNCK0KY5ZRmSHlMAoF9ZINpSlQCafDHIbVqqQnRZ6/VrG9hLc0avn9o8vgtkMfKFDVYozr1yz7GiGDqANaXS4B2xzJncZi3WFaslYOGFx2ctDEgBFGKilhoVaejA9EjcubMKvOVmtLkBTvTz0WTGasBxP0689httGBn/5P2yz+HT7ei3FiDa842ekgw1Ak699AVMrb7CtHpiyS8kTW/KGfrsmwEqWLKC46X/VjAdtKB994RIFN1BMJbza0i16N3rM/8+kVU8yDRm1S5w9gopQHUETVsTrOtoK40SHpYIRhABEBAAG0L0FsZ29yaXRobWlhIFNlY3VyaXR5IDxzZWN1cml0eUBhbGdvcml0aG1pYS5jb20+iQE+BBMBAgAoBQJVPB5WAhsDBQkFo5qABgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRAxFOOGF4jSHdT6B/4wKb+fPYdoOnBjLnz1NgHlEXoP4462Hn5YgYGTi3l9spimnkATYuIVCK1Q80iFJgV8V8TpC2e4XnsdeG7yFPSfHUg3dMikokz3bIfLsAuMj3dML7cp4nRVHlrr28zlC+HRgr9dp7wn4BOfD2qv3q3lknwscygHOCc4pzD7LmB36pzL0BA8M0p31xp+oVMH6zD2OrLws8bUvouxLTzXqtmQFOEt072CS85rXKKYVrDs4/pXgAEtpCcr6xJRcnDKKsdNFoYJDlaUTzD+mzobNYXb45rzB2JsNssQEH2SB6zV9ZLBfQGUe0iF35hNVe3ZceQLuNRbjSR2M913OLEmAUXvuQENBFU8HlYBCADThh+5DXgb6BOHi2TeO58QemL18mzqffRhlyrND+zYvLxwES2c4OoEWfXCGU7vPvtC5x855r8vx+ERVUxx7o+2DzSJU3AQ56YOqDaF55KcJtlfRIT2Xww+9Dl7bO9vL+piFmEMnvNPMUjy7mZm5Zmj3RN+0oewWVgzDrcRMz87SzXRYA2q0K3SlLIiaHD3KtD0lQILIvS9pvkcoNUcxV8pkbYRlQQPhIyxW+9kte+aru8kAfJwqhvrieo6OKscUJWYfX/ORms4iKVCcebfw0R5UmLBRsKKyQ2eIVU0q1UP0F+28YEFbLQlDo25PSD/N3gKnD1thkxqK/33UjDGVLyxABEBAAGJASUEGAECAA8FAlU8HlYCGwwFCQWjmoAACgkQMRTjhheI0h0XFwf7BkgEjHOE5YV8dqFVOC0K3RG8/Ppg63e+KdKO/9cx0gi/LB5O7jVA5m5XeQn2NPVzfr6NcUHjxP15aBu0zOD3ZN7tMNmsAGYMO8QPZIXjgvlWFTYklbxrSI5QEL914ZELdEfIHy1UYO2AoPUAkQCsNOvDhY6URHktLJtFYBUOzFKlDLDkX5Wv+hahG3OZ6XmPN40yeKfug/uO56eMtDDKmm2caQDs3awRUPDJ/EDgPi4Mtx55lHUcZ3bbzGMhIdVS2U8jCiox3IpMm1IogqOek3EBHRkvyeWZafbkqAX7YQwqS6SvjUtIvU/nHWVPdx9KpWggcgkEVs5GTw4VY6pFtA===EMqo
—–END PGP PUBLIC KEY BLOCK—–

 

How Machines See the Web: Exploring the Web Algorithmically

In 1996 Larry Page and Sergey Brin published the Backrub paper, a research project about a new kind of search engine. The concept was a link analysis algorithm that measured relative importance within a set. Based on this one algorithm, the company Google was created and the PageRank index became one of the most famous algorithmic concepts in history.

Knowing how important it is to be indexed for the right thing, we here at Algorithmia were inspired by one of our users when he added an implementation of Page Rank  to the marketplace (Note that Google has moved beyond their original algorithm over a decade ago at this point). We realized that by combining various algorithms available in the Algorithmia API with the Page Rank algorithm, we would be able to get an idea of how search engines view a site or domain. So we’ve essentially integrated every intermediate step and process that a crawler goes through when examining a site (using the algorithms already available in our marketplace), which allows us to understand how machines see the web.

Understanding how pages are linked to each other on a site gives us a snapshot of the connectedness of a domain, and a glimpse into how important a search engine might consider each individual page. Additionally, a machine generated summary and topic tags give us an even better picture of how a web crawler might classify your domain.

Modern web crawlers use sophisticated algorithms to link across the entire internet but even limiting our application to pages on just one domain gives us significant insights into how any site might look to one of these crawlers.

Building a site explorer

We broke up the task of exploring a site into three steps:

  • Crawl the site and retrieve as a graph
  • Analyze the connectivity of the graph
  • Analyze each node for its content

We found algorithms for each part already in the Algorihtmia API, which allowed for quickly building out the pipeline:

  • GetLinks (retrieves all the URLs at the given URL)
  • PageRank (simple implementation of the Backrub algorithm)
  • Url2text (converts documents and strips tags to return the text from any URL)
  • Summarizer
  • AutoTag (Uses Latent Dirichlet Allocation from mallet to produce topic tags)
image

Here is the result (click on a node for more info):

The individual steps

We built this app using three core technologies: AngularJS, D3.js, and the Algorithmia API.

The first thing we needed to do was crawl the domain supplied. We allow the user to determine the number of pages to crawl (limited in our demo to a max of 120 – at this point your browser will really start hurting though).  For each page crawled, we retrieve every single link on that page and plot it on the graph. Then, once we have reached the max pages to crawl, we apply PageRank to the result.

Get the links:

image

Iterate over site:

image

Apply Page Rank:

image

Once the graph is built, we render it using a D3 force layout graph. Clicking on any individual node retrieves the content from that page, cleans up the HTML so we are left with just the text, and process the text through both the summarizer and topic tagger algorithms.

Really, the hardest part about building this was figuring out the quirks of D3, since the Algorithmia API just allowed us to stitch together all the algorithms we wanted for the process and start using them, without worrying about a single download or dependency.

Don’t take our word for it try it yourself , we have made the AngularJS code available here, feel free to fork it, modify it and use it in your own applications.

It’s really easy to expand the capabilities of the site mapper, here are some ideas:

  • Sentiment by Term (understand how a term is seen across the site e.g.: Apple Watch on GeekWire)
  • Count Social Shares (understand how popular any link-node is on a number of social media sites)
  • many more that can be found in Algorithmia…

Made it this far?

If you tweet “I am [username] on @algorithmia” we will add another free 5k credits to your account.

– Diego (@doppene)

Accessing Algorithmia through Python

Today’s blog post is brought to you by one of our community members John Hammink

Imagine – you are a novice programmer and want to harness the power of algorithms.   Or maybe you are an academic, or an expert in another area that has nothing to do with programming and want to incorporate algorithms into your own work.  Now you can!   

To leverage the power of Algorithmia’s service, you must first understand a thing or two about REST APIs as well as how to do something with the output.  (Incidentally, what you learn about REST APIs here can be applied across the board to many web services!).

In this article, we’ll use python to create a small program to use a few REST APIs.

Why python?  Python’s easy, like using a notebook.  It’s very demoable code for any skill level.  So here’s what we’ll do:

  1. First, we’ll start out using cURL at the command line and walking through various usage scenarios.
  2. Next, we’ll look at a basic Python example for using a REST API.
  3. After that, we’ll get more advanced:  there are different ways to using the REST API within Python.  And why you might prefer different alternatives depending on the algorithm and the performance you want to achieve.
  4. Next, we’ll look at some different APIs.
  5. Our goal is to build an extremely simple console application to demonstrate the power of a couple of Algorithmia’s algorithms.
  6. Finally, we’ll end with a question: which problems would you, our users, via comments, want to solve?  

Starting off with cURL

First things first!  Head over to Algorithmia.com and sign-up if you have not already so you can get a free authentication token. You can find your API Key under your account summary:

image

Let’s look at the profanity detection algorithm.  You can read the summary docs of the algorithm here:  https://algorithmia.com/algorithms/nlp/ProfanityDetection

Let’s use cURL at the command line to access the API and pass a request.  

curl -X POST -d ’[“He is acting like a damn jackass, and as far as I am concerned he can frack off.”,[“frack”,“cussed”],false]’ -H ‘Content-Type: application/json’ -H 'Authorization: <insert_auth_token>’ https://api.algorithmia.com/api/nlp/ProfanityDetection

Before you move ahead, let’s explain what each parameter does.

curl -X POST -d ’[“He is acting like a damn jackass, and as far as I am concerned he can frack off.”,[“frack”,“cussed”],false]’ -H  

-X is how you use curl to send a custom request with data, specified with -d.   The data to be passed to the profanity detection algorithm comes in three parts:

  1. The sentence or phrase to be analyzed for profanity.  This can also be a variable, or a string scraped from somewhere else in the internet.
  2. Additional words or phrases to be added to the profanity corpus when analyzing a.
  3. a boolean value – when set to true, the additional words or phrases are ignored.

-H Allows us to send an extra header to include in the request when sending HTTP to a server. Here, we’re calling it twice – once to add the content type spec, the second time to add an authorization header.  Note that in our documentation and blog, this auth key will automatically be populated with your auth key….and finally, the url for the algorithm in the Algorithmia API that we want to use.

…enter python!

Python just might be the easiest language with which to make HTTP requests.  

Taking a step back, let’s look at how we might run a cURL-like request using python.  

First, let’s look at how one might use python with  cURL using a subprocess call.  Try this in python interactive mode:

Python 2.7.6 (default, Sep  9 2014, 15:04:36)[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwinType “help”, “copyright”, “credits” or “license” for more information.>>> data = “”“["I would like to eat some flaming cheesecake.”,[“cheesecake”,“cussed”],false]“”“>>> import subprocess>>> subprocess.call(['curl’, ’-X’, 'POST’, ’-d’, data, ’-H’, 'Content-Type: application/json’, ’-H’, 'Authorization:51619aab16b44ad0baef1d2c5110fb21’,  'http{"duration”:0.0009742940000000001,“result”:{“cheesecake”:1}}0>>>

This works, and passes the data, but there is always going to be a performance hit when calling cURL or any subprocess, for that matter, from within Python.  Generally, native function calls simply work best!   Python handles the whole data-passing thing much more cleanly.   Let’s look at a simple ‘dataless’ implementation (an urllib2 request where we don’t pass any data to the algorithm) might look like in python using urllib2 library.

import urllib2, jsonrequest = urllib2.Request('https://api.algorithmia.com/api/nlp/ProfanityDetection’)request

What is going on here?   We are simply creating a request and storing it to a variable.   Running it returns this:

<urllib2.Request instance at 0x7fcb52d5e3f8>

In this example, the imported json library and input variables are just placeholders for what we’re about to do – pass and return actual data to an urllib2 request.   Let’s try that now:

import urllib2url = 'https://api.algorithmia.com/api/nlp/ProfanityDetection’data = “”“["I would like to eat some damn cheesecake.”,[“cheesecake”,“cussed”],false]“”“req = urllib2.Request(url, data, {'Content-Type’: 'application/json’})req.add_header('Authorization’, ’<insert_auth_token>’)f = urllib2.urlopen(req)for x in f:    print(x)

This actually prints out the contents the data, returned by Algorithmia’s API:

{"duration”:0.089443493,“result”:{“damn”:1,“cheesecake”:1}}

To close the object, just add:

f.close()

So now, we have the basics.  You can use the above example for calling more or less any REST API in Python.  

In my study of algorithms over REST APIs, I’ve learned that once we’ve mastered the basics of calling the API from whatever language or framework we’re using, it’s time to play around with the parameters we’re passing to the algorithm.

Consider this:

data = “”“["I would like to eat some damn cheesecake.”,[“cheesecake”,“cussed”],false]“”“

There are actually several things going on here.   There is the main data we’re parsing:

"I would like to eat some damn cheesecake.”

Then there are the additional words we add to our cuss-corpus, in this case:

[“cheesecake”,“cussed”]

Last, but not least,  there is a boolean value being passed at the end of the string.   This value tells us whether or not to ignore known profanities (and only go with cuss-words you provide!).   The input format is documented on the algorithm summary page on Algorithmia.  I tried actually passing several parameters to the data string; you can have fun with this too:

data = “”“["I wouldd totally go bananas with his recommendations, if he weren not such a pompous ass”, [“cuss”, “banana”] , false]“”“data = ”“”[“Ernst is a conniving little runt.”, [“eat”, “little”] , false]“”“data = ”“”[“Ernst is a conniving little runt.”, [] , false]“”“

You can have a load of fun with the data set on this one.  Did you try these?  Did you invent some others? What did you get?

Other Algorithms
Now, let’s try calling some of the other APIs.

Using the template we provided earlier, let’s try calling a few different algorithms.  We’ll do this simply by switching out our url and data parameters.

Note that we can get some clues by studying the sample input above the API console for each.

Let’s look first at Do words rhyme ? (algorithm summary available at: https://algorithmia.com/algorithms/WebPredict/DoWordsRhyme)

import urllib2url = 'http://api.algorithmia.com/api/WebPredict/DoWordsRhyme’data = ”“”[“fish”, “wish”]“”“req = urllib2.Request(url, data, {'Content-Type’: 'application/json’})req.add_header('Authorization’, ’<insert_auth_token>’)f = urllib2.urlopen(req)for x in f:      print(x)

This returns:

{"duration”:0.23639225300000002,“result”:true}

Hint:  you can isolate just the result value you need by using python’s dictionary manipulations.   So for example, if you are getting your response this way:

import jsonresponse = urllib2.urlopen(req, json.dumps(data)

You can pass it to a python dictionary, and then just take the result:

f = response.read()rhyminDict = json.loads(f)rhyminResult = rhyminDict['result’]

You can mix it up with the input data and try again.    And you can benchmark different methods duration times by calling it directly with cURL or with cURL called as a subprocess in python.   See the performance difference?

Now let’s try some other algorithms:

Word Frequency Counter

algorithm summary available at https://algorithmia.com/algorithms/diego/WordFrequencyCounter

url = 'http://api.algorithmia.com/api/diego/WordFrequencyCounter’data = ’“Does this thing need a need like this need needs a thing?  Cause the last thing this thing needs is a need.”’

Geographic Distance

algorithm summary available at https://algorithmia.com/algorithms/diego/GeographicDistance

url = 'http://api.algorithmia.com/api/diego/GeographicDistance’data = “”“{"lat1”: 60.1,“long1”:24.5,“lat2”:37.2,“long2”:122.4}“”“

Are you getting the hang of it now?   

Putting it all together

Now let’s put together a simple console application in Python.

NOTE:  This app is not intended to take the place of legitimate psychological or arousal research.  It’s just code to illustrate what you can do with these algorithms.

In this app, we’ll use two algorithms, profanity detection and sentiment analysis to get a general read on the user’s state of mind, from the input they give.   

Listing:  howsyourday.py  Full source code available at: https://github.com/algorithmiaio/samples/blob/master/HowIsYourDay.py

############################################################## Simple python app that uses 2 algorithms in algorithmia API                     ## 	- Profanity Detection                                                                                    ## 	- Sentiment Analysis                                                                                     ##                                                                                                                       ## Author: John Hammink <john@johnhamm.ink>                                           #############################################################
import urllib2, json
def get_cuss_index(sentiment):    url = 'https://api.algorithmia.com/api/nlp/ProfanityDetection’    filterwords = ["cuss”, “blow”, “spooge”, “waste”]    ignore = True    data = “”“[”%s", [“%s”], %s]“”“ % (sentiment, filterwords, ignore)    req = urllib2.Request(url, data, {'Content-Type’: 'application/json’})    req.add_header('Authorization’, 'cb7a25d50ddd4deeaed9e4959cfa1ffe’)    response = urllib2.urlopen(req, json.dumps(data))    f = response.read()    curseDict = json.loads(f)    curseResult = curseDict['result’]    cuss_index = sum(curseResult.values())    return cuss_index
def analyze_sentiment(sentiment):    url2 = 'http://api.algorithmia.com/api/nlp/SentimentAnalysis’    data2 = str(sentiment)    req2 = urllib2.Request(url2, data2, {'Content-Type’: 'application/json’})    req2.add_header('Authorization’, 'cb7a25d50ddd4deeaed9e4959cfa1ffe’)    response = urllib2.urlopen(req2, json.dumps(data2))    g = response.read()    analysis = json.loads(g)    analysisResult = analysis['result’]    return analysisResult
def gimme_your_verdict(cuss_index, analysisResult):    highArousal = ’"My, we are feisty today! Take five or go for a skydive!”’    mediumArousal = ’“Seems like you are feeling pretty meh”’    lowArousal = ’“Hey dude, are you awake?”’    if cuss_index >= 2 or analysisResult >= 3:   	 	print highArousal    elif cuss_index >= 1 or analysisResult >= 2:   	 	print mediumArousal    else:   	 	print lowArousal    print “Come back tomorrow!”        if __name__ == ’__main__’:    print “Como estas amigo?”    sentiment = str(raw_input(“How was your day? Swearing is allowed , even encouraged: ”))    cuss_index = get_cuss_index(sentiment)    analysisResult = analyze_sentiment(sentiment)    gimme_your_verdict(cuss_index, analysisResult)

We’ve put together three functions.   get_cuss_index and analyze_sentiment both query and parse the returns from their respective algorithms;  both functions load their respective json responses into a python dictionary, then parse the dictionary for only the result value; get_cuss_index actually sums the values in the dictionary, since there may be many.

gimme_your_verdict returns a snarky comment based on a conditional derived from the values returned by the first two functions.    

Having a bad day today, oh feisty one?  Well, there’s always tomorrow!

-John

A question for you, the reader….

Which other problems would you like to solve? Do you have a favorite algorithm you’d like to see demonstrated?  In a different programming language or framework?  Maybe you’d like to propose one – an algorithm or a solution with an existing one – that doesn’t exist yet?   

Let us know in the comments below! Follow and tweet us at @algorithmia We’ll get right on it – and you’ll see more on the topic, either in our blog or in our knowledge base.   Thanks for reading!