Algorithmia

How We Turned Our GitHub README Model Into a Microservice

Hosting our GitHub Machine Learning ModelA few posts ago we talked about our data science approach to processing and analyzing a GitHub README.

We published the algorithm on our platform, and released our project as a Jupyter Notebook so you could try it for yourself. If you want to skip ahead, here’s the demo we made that grades your GitHub README.

It was a fun project that definitely motivated us to improve our own README’s, and we hope it inspired you, too (I personally got a C on one of mine, ouch)!

In this post, we’d like to take a step back to talk about how we used Algorithmia to host our scikit-learn model as an algorithm, and discuss the benefits associated with using our platform to deploy your own algorithm.

Why Host Your Model on Algorithmia?

There are many advantages to using Algorithmia to host your models.

The first is that you can transform your machine learning model into an algorithm that becomes a scalable, serverless microservice in just a few minutes, written in the language of your choice. Well, as long as that choice is: Python, Java, Rust, Scala, JavaScript or Ruby. But don’t worry, we are always adding more!

Second, you can monetize your proprietary models by collecting royalties on the usage. This way your agonizingly late nights slamming coffee and turning ideas into code won’t be for nothing. On the other hand, if altruism and transparency are your thing (it’s ours!) then you can open-source your projects and publish your algorithms for free.

Either way, the runtime (server) costs are being billed to the user who calls your algorithm. Learn more about pricing and permissions here.


A Step-By-Step Guide to Hosting Your Model

Now, on to the process of how we hosted the GitHub README analyzer model, and made an algorithm! This walkthrough is designed as an introduction to model and algorithm hosting even if you’ve never used Algorithmia before. For the GitHub ReadMe project, we developed, trained, and pickled our models locally. The following steps assume you’ve done the same, and will highlight the activities of getting your trained model onto Algorithmia.

Step 1: Add Your Data

The Algorithmia Data API is used when you have large data requirements or need to preserve state between calls. It allows algorithms to access data from within the same session, but ensures that your data is safe.

To use the Data API, log into your Algorithmia account and create a data collection via the Data Collections page. To get there, click the “Manage Data” link from the user profile icon dropdown in the upper right-hand corner.

Once there, click on “Add Collection” under the “My Collections” section on your data collections page

. After you create and name your data collection, you’ll want to set the read and write access to make it public, or lock it down as private. For more information about the four different types of data collections and permissions checkout our Data API docs.

Besides setting your algorithms permissions, this is also where you load either your data or your models, or both. In our GitHub README example we needed to load our pickled model rather than raw data. Loading your data or model is as easy as clicking the box ‘Drop files here to upload’ or drag and dropping them from your desktop.

You’ll need the path to your files in the next step. For example:
data://username/collection_name/data_model.pkl.zip

Uploading your trained model

Upload your model

Our recommendation is to preload your model before the apply() function (details below). This ensures the model is downloaded, and loaded into memory without causing a timeout when working with large model files. We support up to 4GB of memory per 5-minute session.

When preloading the model like this, only the initial call will load the model into memory, which can take several seconds with large files.

From there, only the apply() function will be called, which will return data much faster.

Step 2: Create an Algorithm

Now that we have our pickled model in a collection, we’ll create our algorithm and set our dependencies.

To add an algorithm, simply click “Add Algorithm” from the user profile icon. There, you will give it a name, pick the language, select permissions, and make the code either open or closed source.

Create your algorithm and set permissions.

Create your algorithm and set permissions.

Once you finish that step, go to your profile from the user profile icon where your algorithm will be listed by name. Click on the name and you’ll be taken to the algorithm’s page that you’ll eventually publish.

There is a purple tag that says “Edit Source.” Click that and it’ll open the editor where you can add your model and update your dependencies for the language of your choice. If you have questions about adding your dependencies check out our Algorithm Developer Guides.

Your new algorithm!

Step 3: Load Your Model

After you’ve set the dependencies, now you can load the pickled model you uploaded in step 1. You’ll want import the libraries and modules you’re using at the top of the file and then create a function that loads the data.

Here’s an example in Python:

import Algorithmia
import pickle
import zipfile

def loadModel():
    # Get file by name
    client = Algorithmia.client()
    # Open file and load model
    file_path = 'data://.my/colllections_name/model_file.pkl.zip'
    model_path = client.file(file_path).getFile().name
    # unzip compressed model file
    zip_ref = zipfile.Zipfile(model_path, ‘r’)
    zip_ref.extract('model_file.pkl’)
    zip_ref.close()
    # load model into memory
    model_file = open(‘model_file.pkl’, ‘rb’)
    model = pickle.load(model_file)
    model_file.close()
    return model

def apply(input):
    client = Algorithmia.client()
    model = loadModel()
    # Do something with your model and return your output for the user
    return some_data

Step 4: Publish Your Algorithm

Now, all you have to do is save/compile, test and publish your algorithm! When you publish your algorithm, you also set the permissions for public or private use, and whether to make it royalty-free or charge a per-call royalty. Also note that you can set permissions for the algorithm version to say whether internet is required or if your algorithm is not allowed to call other algorithms.

Publish your algorithm!

Publish your algorithm!

If you need more information and detailed steps to creating and publishing your algorithm follow these instructions to getting your algorithm on the Algorithmia platform check out our detailed guide to publishing your first algorithm.

Now that you’ve hosted your model and published your algorithm, tell people about it! It’s exciting to see your hard work utilized by others. If you need some inspiration check out the GitHub ReadMe demo, and our use cases.

If you have more questions about building your algorithm check out our Algorithmia Developer Center or get in touch with us.