Algorithmia Blog

Building Intelligent Applications

Algorithmia was delighted to speak at Seattle’s Building Intelligent Applications meetup last month.  We provided attendees with an introductory view of machine learning, walked through a bit of sample code, touched on deep learning, and talked about about various tools for training and deploying models.

For those who were able to attend, we wanted to send out a big “thank you!” for being a great audience.  For those who weren’t able to make it, you can find our slides and notes below, and we hope to see you at the next meetup on Wednesday, April 26.  Data Scientists Emre Ozdemir and Stephanie Peña will be presenting two Python-based recommender systems at Galvanize in Pioneer Square.

To come to Wednesday’s talk, RSVP via Eventbrite.  To keep an eye out for future events, join the Building Intelligent Applications Meetup Group.


All the Moving Parts: Bringing Machine Learning to your Application


Jump to:


The first step you might consider before diving into a machine or deep learning project is you need a basic understanding of statistics and linear algebra.

BUT…more important than the math: understand the big picture of the models and how they apply to use cases and different datasets versus memorizing the math behind every algorithm.

The next step is to learn a programming language conducive to using common machine learning libraries. While there are machine learning libraries in Ruby and even Node the libraries aren’t as fully developed nor have the community support of languages like Python, Java, Scala and R.

While technical skills are important, so is having domain knowledge. Whether your problem domain is in the financial, educational or real estate domain, you should understand your business problem very well.

Finally we should talk about software engineering. No, you don’t have to be a software engineer to tackle machine learning but you will need to understand how a model will integrate into your current ecosystem and how will it impact performance.


What makes something an “intelligent” application? There are certain tasks which, until recently, we’ve always relied on humans to do. Many of these require some form of pattern recognition: reading a handwritten note, categorizing images, or understanding the meaning of a sentence. Writing the explicit code to do accurately perform of these tasks is very hard (or impossible), but if we can teach a computer to recognize certain patterns, then it can learn how to perform them without a programmer writing (much) discrete code.


Intelligent applications already saturate our day-to-day world: consider video filters which must recognize and track faces, smartphone assistants which understand our meaning (mostly) and perform tasks for us, drug-discovery tools which identify molecules that might bind to certain receptor sites for, or image enhancers which can figure out which colors ought to be present in a greyscale image.


When we build a classical software application, we pipe some user input into some explicit logic that a programmer has hand-coded, and this yields some immediately useful output. By contrast, Machine Learning is a two-step process.

First, we take some set of data we have on-hand, clean it up and split it into training and test data, pick a mathematical model to use, and then train the model to recognize patterns in the data. For example, we might have a set of images which we know to be either cats or dogs. We feed these images into our model, training it (via various techniques) to recognize whether each image it sees is either a cat or a dog, and build from this a “trained model”.

Next, we take this trained model and use it to react to new user input. If a user presents us with a new image, our model will give us back a best guess of “that’s a cat!” or “that’s a dog!”. We can do this repeatedly without retraining our model, so long as all we want to do it differentiate cats from dogs.

Of course, if our user gives us a picture of a hedgehog, our model will fail, because it only knows how to recognize cats and dogs. If we want to fix that, we’ll need to go back and retrain the model, including some images of hedgehogs this time around. Then we’ll take our retrained model and drop it into our application where the old model used to be.


Here’s a concrete example of replacing a “classical” software app with a machine-learning solution, using the task of detecting whether an image is likely to contain nudity — a problem which websites and social utilities often face if they need to ensure that their content is “safe for work”.

One classical approach first locates any face(s) in the image, samples the pixels around the nose area (which is usually unclothed) in order to determine the individual’s skin tone, then considers what percentage of the overall image is skin-colored. This works reasonably well for many images, but fails for a significant number of cases (for example, when there are many other objects near, but not covering, the nude individual.

A machine-learning approach which performs much better uses image tagging. First, we feed our model a large set of images, both nude and non-nude. We’ve tagged these images beforehand with the names of the items they contain: “car”, “scarf”, “nipples”, etc. The machine learns how to identify specific items, and when training is complete, it can be fed new images and it will give back a list of items it thinks the image probably contains, along with a confidence level for each item (“I’m 99% sure there’s a face in this image, but only 20% sure there’s a pair of pants”). Since we know that certain items imply nudity, we can then create a composite score for the image (“It contains a face, shirt, and pants, and no nipples… probably not nude”).


ML has a lot of sub-fields and uses, but some common ones that have broad applications are:


Broadly, we can think of ML s having two main categories, Supervised and Unsupervised Learning.

In Supervised Learning, our data has usually been pre-tagged with Features (for example, the list of items in an image) and Labels (the outcome we expect the machine to give us)

Unsupervised learning doesn’t require pre-tagged data; instead, it is used to find patterns in data. For example, we might provide a set of customer demographics (age, weight, gender, etc) and ask the model to group them by similarity. Often, unsupervised learning is a precursor to a supervised step, but sometimes it stands on its own.


Before you begin working with your data, however, you’ll almost always need to clean it up. In both Supervised and Unsupervised learning, you may need to standardize the data (for example, grouping continuous data into buckets (age 10-20, 21-30, etc) or getting rid of datapoints which are clearly invalid (I’m only training a cat vs dog classifier, but somebody threw in a single hedgehog!)

With Supervised Learning, you’ll also need to label your data (“this picture is a cat, but that one is a dog”) and possibly narrow down the variables/information which the model will actually consider (for example, pre-cropping images or discarding certain irrelevant columns of the dataset) — having too many variables to consider can lead to overfitting, as well as slowing down the training process.


A few examples of Supervised Learning types/models:


Unsupervised Learning is used to find similarities and group things together, or identify the specific characteristics which cause those groupings. We might be looking to find out what types of customers behave most like each other, or figure out which features tend to differ between those groups of individuals.


There are a lot of libraries available to help you in your ML pursuits (see the Resources section at the end for just a few). One common starting place is scikit-learn, a readily available Python package. Simply “pip install scikit-learn” to begin using it.

We’ll walk through a code example using the popular “iris” dataset (available inside scikit-learn, but also downloadable elsewhere if you’re using a different language or library). This data set pertains to three specific species of irises (the flower, not the eyes): i setosa, i versicolor, and i verginica. The heights and widths of these flowers’ petals and sepals tends to differ between species, so we’ll train a model to identify which species a given flower is, given these dimensions.

Our dataset has a bunch of rows, each of which is an individual flower which somebody measured in the field. “Iris.data” contains the dimensions of the petals and Sepals. “Iris.target” tells us what species each one is (encoded as 0, 1, or 2 for setosa, versicolor, or verginica).


We begin by importing sklearn’s datasets and loading the iris dataset into memory.

Before we actually make use of the data, though, we want to break it into a “training” portion (about 80% of the data) which we’ll use for creating our model, and a “testing” portion (the remaining 20%) which we’ll use to verify that the model is working properly. But… before doing even that, we need to shuffle the data to ensure that it is randomly distributed across our training vs testing portions. Now we have two groups of data: data_train (the petal and sepal dimensions) and target_train (the species of flower) will be used for training. Data_test and target_test will be used for testing the model.

Next, we’ll create a Support Vector Machine and train it on our data. This is a deceptively simple one-liner… we bring in svm from sklearn and call .fit(), handing it our training data and a bunch of other parameters. In reality, you may need to do this many times, changing the parameters until you get a model which fits your data well; look up the documentation for scikit-learn SVM to see exactly what those parameters are, and learn how to adjust them.

Once we’re done training, we have a model which can be used to predict the species for new rows of data. If we intend to use this model elsewhere, we can save it to a file using pickle.dumps(), and reload it using pickle.loads(). This is pretty common practice, since you’ll often train your model on one machine, then use/host the model elsewhere in your user-facing application.

We’ll also test our model (in reality, we’d do this before ever bothering to pickle it, so rearrange this code sample to fit your needs). This is done by simply calling model.predict() and handing it one or more rows of petal and sepal dimensions. It gives back row(s) of predicted species. Here, we compare these results against our known-correct target_test to see how well our model behaved. If it was poor, we go back and tweak our parameters.


If you’re looking to work with natural (human) languages, the Python Natural Language ToolKit has some great tools for parsing text, generating trees representing each sentence’s structure, identifying parts-of-speech, and other language-specific activities… as well has a bunch of sample texts to play with.

If you’re a Java fan, or want to start with a more visual exploration of your data, take a look at Weka. It contains a bunch of ML packages you can run in your Java app, but also has both command-line and visual user-interface modes of operation.


Deep learning is a type of machine learning that relies on an architecture that is modeled after the human brain. The way that information is passed between neurons in your brain and the pathways between them inspired an artificial neural network architecture that is currently being used to classify images, used in sentiment analysis and to transform audio to text.

Why model after the human brain?


What makes deep learning special?


There are of course some differences in how the human brain works and how artificial neural networks are designed.

For instance in certain deep learning architectures like the one in the slide (well, this one is kinda shallow since it has only a couple of layers), the information flows one way and each node can only be targeted by the layer before it. This particular example is called a feedforward neural network.

There are 3 basic layers in a neural network: the input, output and middle or hidden layers.

Using the example from the beginner Kaggle competition that attempts to predict Titanic survival, we’ll use whether a passenger survived (1) or died (0) based three features: age, ticket price and sex where gender is set to a 0 for male and a 1 for female.

Our net’s goal is to try to learn what features are important in determining the survival rate of passengers.

Something to remember, is that the inputs get the numeric values, but no computation is done in the input layer.


Notice the inputs are modified with unique weights that are randomly assigned by the algorithm.

When the weights of the input nodes get passed along to the hidden layer, each node in the hidden layer has an activation function which determines if the next layer of nodes will receive any information.

The next hidden layer of nodes (hidden layer 2) will receive the values (the sum of weights) of the previous layer. Note that the learning happens when there is the error calculated between the model’s output and the known sample output data, then the model adjusts the weights accordingly.

Finally when the difference between the known output and the model’s output are small enough, you reach the final output layer. A confidence score is is the output.

Then you can pickle your model and classify your data!


In the above slide there is a screenshot of playground.tensorflow, an interactive site that allows you to try different datasets to solve for different problems.

And that’s your high level intro to deep learning and deep learning architecture.

Next is a summary of what we just covered

Next we’ll briefly look at the available deep learning frameworks.



Keras is a deep learning framework that runs on top of Theano or Tensorflow.

And just released: Keras 2 https://blog.keras.io/introducing-keras-2.html

Now that you know a little about machine and deep learning, I’ll describe how to select your model, where to train it and where to deploy it to make it useable on new data.


Size of Data

Type of Data


Local Machine Option:

You’ve chosen your model and now want to train it on your dataset. While you’re comfortable with installing and managing dependencies, you don’t want to pay for a cloud instance, so you figure you can train it on your local machine. When is this a good idea?

Self Managed Option:

All of the self managed services listed here have great documentation and tutorials, they all provide the ability to manage instances in both a console environment & API, and you can create custom instances in all of them as well:

They are all reliable to a point, so they offer discounts when there are outages. Some don’t offer all of the add-ons you want, so make sure when you’re deciding between them that they support the pipeline you want.

Managed/UI:

There is also the option of the more plug-in-play variety. This option is suited for when you have a specific use case (due to the limitations on models available), but great if you don’t necessarily have the expertise to code from scratch to train your model. They all have basic models to train your data with, you can access your trained model via a REST API, they offer basic model evaluation and versioning if you want to update your model and the pricing tends to be broken down into training and prediction pricing.

When you have a ton of data:

Note For all of the self-managed and managed options, you will also need to store your data models and resources on their platform which sometimes adds to the price and does not include deployment costs.

Whether you have an unsupervised model that tags documents on the fly or a trained model that finds the sentiment of user input to properly route them in a call center, you’ll need a platform to deploy our model to make inferences or predictions. The next slide will review your options.


For deployment we’ll only cover the managed services rather than the self-managed solutions.

What we mean by deployment is that you take your trained model and you integrate it as a service into your current stack to make some sort of predictions or inferences on new unseen data.

For example you have a recommendation engine that when a user signs in, uses a trained model to predict what products the user would like based on some sort of features such as the user’s past purchase history.

All of these examples of managed services deploy your model via an API endpoint which is how you would use it to make predictions or inferences. Most are all pay as you go prediction pricing which is billed separately from training and hosting pricing even when you use the same platform except Algorithmia and Google Cloud ML where you can host for free, and then are charged for predictions.

Google Cloud ML

Amazon ML

Azure ML & ML Studio

Algorithmia

Heroku

Of course after deployment you’ll want to continually evaluate the model and update if necessary.


Next Steps:

Resources:

Statistics

Machine Learning

Libs/Toolkits

Model Deployment