Algorithmia Blog - Deploying AI at scale

Chaining machine learning models in production with Algorithmia

Workflow showing the tools needed at each stage

In software development, it makes sense to create reusable, portable, self-contained modules that can seamlessly plug into any application. As the old adages insist: rely on modular design, don’t repeat yourself (DRY), and write once, run anywhere. The rise of API-first design, containerization, and serverless functions has taken these lessons even further—allowing individual modules to be developed in separate languages but executed from anywhere in any context.

To reach its full potential, machine learning must follow the same principles. It’s important to create reusable abstractions around your models, keep them in a well-documented and searchable catalog, and encourage model reuse across your organization.

During model training, techniques such as transfer learning begin to address this need; but how can we benefit from reuse of shared models and utilities once they are already in production?

Architectural principles

Design with abstraction in mind: while you may be building a model for a specific, constrained business purpose, consider how it might be used in other contexts. If it only takes singular inputs, for instance, could you provide a simple wrapper to allow batches of inputs to be passed in as a list? If it expects a filename to be passed in, should you also allow for URLs or base64-encoded input?

Document and centrally catalog your models: once you’ve put in the hours necessary to conceive and train a model, move it off your laptop and into a central repository where others can discover it. Provide simple, clean documentation, which describes the purpose, limitations, inputs, and outputs of the model.

Host your models in a scalable, serverless environment: downloading is tedious, limiting, and resource-wasteful. Instead, allow other developers to access your model directly via an API. This way, they’ll only need to add a few simple lines of code to their application, instead of duplicating your entire model and associated dependencies. Host that API endpoint in a serverless environment so it can scale indefinitely and satisfy calls from any number of applications.

Search for existing endpoints before creating your own: there’s no need to build your own code from scratch, or even add another large dependency to your project. If the functionality is already provided by an API you can call, using existing resources is preferred. By thinking API-first, you’ll decrease your own module’s weight while minimizing technical debt and maintenance effort.

Design with abstraction in mind: consider how [a model] might be used in other contexts.

Model reuse and scaling

Algorithmia’s public model marketplace and Enterprise AI Layer have been designed with these principles in mind. Every model is indexed in a central, searchable catalog (with the option for individual or team-level privacy) with documentation and live sample execution, so developers can understand and even live-test the model before integrating it into their codebase.

Every model is run in Algorithmia’s scalable serverless environment and automatically wrapped by a common API, with cut-and-paste code samples provided in any language. There is no need to dig through sprawling API documentation or change patterns based on which model is called: integrating a Java deep-learning model into a Python server feels and acts as seamless as calling a local method. Running an R package from frontend JavaScript is just a simple function call.

Screenshot from algorithmia.com of an R packageset running as a function call.

Chaining models

The benefits of Algorithmia’s design extend beyond executing models from end-user applications: it is equally simple to call one model from another model, a process known as model chaining or production model pipelining (not to be confused with training pipelines).

The core of this is the .pipe() call. UNIX users will already be familiar with the pipe “|” syntax, which sends input from one application to another; on Algorithmia, .pipe() sends input into an algorithm (a model hosted on Algorithmia), and can be used to send the output of one model directly into another model, or into a hosted utility function. For example, if we have a model called “ObjectDetection” for recognizing objects in a photo, and a utility function called “SearchTweets” for searching Twitter by keyword, and another model called “GetSentiment” which uses NLP to analyze the sentiment of text, we can write a line of code very similar to:

GetSentiment.pipe( SearchTweets.pipe( ObjectDetection.pipe(image).result ).result )

This runs an image through ObjectDetection, then sends the names of detected objects into SearchTweets, then gets the sentiment scores for the matching tweets.

Let’s implement this as an actual model pipeline, using the Algorithmia algorithms ObjectDetectionCOCO, AnalyzeTweets, UploadFileToCloudinary, and GetCloudinaryUrl. We’ll extend it a bit by picking one of the top sentiment-ranked tweets, overlaying the text on top of the image, and sending that image over to Cloudinary’s CDN for image hosting. Our full code looks something like this:

Object detection and tweet analysis chain code snippet

Line-by-line, here are the steps:

  1. Create a client for communicating with the Algorithmia service
  2. Send an image URL into ObjectDetectionCOCO v. 0.2.1, and extract all the labels found
  3. Search Twitter for tweets containing those labels via AnalyzeTweets v. 0.1.3, which also provides sentiment scores
  4. Sort the tweets based on sentiment score
  5. Upload the original image to Cloudinary
  6. Overlay the top-ranked tweet’s text on top of the image in Cloudinary’s CDN

Now, with just 6 lines of code, we’ve chained together two ML models and two external services to create a fun toy app! But let’s go further, making this an API of its own, so other developers can make use of the entire pipeline in a single call. We head to http://algorithmia.com (or to our own privately-hosted Enterprise instance) and click Create New Algorithm. Then place the same code into the algorithm body:

create new algorithm on algorithmia.com with memegenerator.py

After we publish this, any other user will be able to make use of this pipeline by making a single function call, from any language:

making pipelines available for use by others

You can try this out yourself, and even inspect the source code (enhanced with some overlay formatting and random top-N tweet selection) at https://algorithmia.com/algorithms/jpeck/MemeGenerator!

Going further

This toy example was fun to develop, but every industry has its own specific needs and workflows that can be improved with model chaining. For a few more model-chaining examples, read how to:

Or, explore our Model Pipelining whitepaper which addresses the business-level benefits of model pipelining within your enterprise.

Thanks for taking the time to explore with Algorithmia; we look forward to seeing what great model pipelines you dream up!

Developer Advocate for Algorithmia, helping programmers make the most efficient and productive use of their abilities.

More Posts

Follow Me:
TwitterLinkedIn