All posts by Stephanie Kim

Deploying R Models Into Production

The language you use to create a model can make it easier or harder to deploy into production. While the R programming language makes it easy to compare different models against each other and makes data exploration and visualization a breeze using R Studio, it’s not always the easiest language to deploy into production due to some complex versioning, with different packages installing and relying on different versions of the same dependency.

Luckily, Algorithmia has your back for working with R in production.

Here is a tutorial to show how to deploy a simple Iris classification model using a Naive Bayes classifier and along the way, we’ll discuss the different ways to load dependencies in R models. To get all the necessary files so you can follow along, here is the repository for this tutorial and for more information on deploying models in R and general algorithm creation in R check out the Algorithm Development Guides. Note that we support full R language and standard library version 3.4.

Upload Your Data To Data Collections

In this demo, we are going to host our data on the Algorithmia platform in Data Collections.

You’ll want to create a data collection to host your saved model and your test data:

  1. Login to your Algorithmia account and click your avatar which will show a dropdown of choices. Click “Manage Data”
  2. Then in the left panel on the page of data collection options, go ahead and click “My Hosted Data”
  3. Click on “Add Collection” under the “My Collections” section on your data collections page. Let’s name ours “iris_r_demo”.
  4. After you create your collection you can set the read and write access on your data collection. We are going to select “Private” since only you will be calling your algorithm in this instance. For more information on ACL permission types for Algorithmia hosted data check out the docs.
  5. Now, let’s put some data into your newly created data collection. You can either drag and drop the file naive_bayes_iris.rds or you can click “Drop files here to upload” from where you stored the repo on your computer.

Note, that you can also upload your model or data files from Dropbox or Amazon S3 using the Data API

Create Your Algorithm

Now we are ready to deploy our model.

First, create an algorithm

  1. Click the “Plus” icon at the top right of the navbar.
  2. Check out Getting Started in algorithm development to learn about the various permissions in the form. Do note that if you want to delete your algorithm later, you should set it to “Private”. Make sure you’ve gone through the Getting Started section to cover any questions you have about creating your algorithm.
  3. Click on the purple “Create Algorithm”.

Now that you have created your algorithm, you’ll get a modal with information about using the CLI and Git. Every algorithm has a Git repo behind it so you can experiment with different I/O in development mode by calling the hash version.

Add Code Sample

  1. Click on the tab “Source” and you’ll notice boilerplate code for Hello World.
  2. Let’s delete that code, and copy and paste the code from the file demo.R
  3. Note that you’ll need to change the name of the data collection path to the one we created earlier.

Recall our data collection is called “iris_r_demo” and you’ll need to change “YOUR_USERNAME” to your own username:

file_path = 'data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds'
library(algorithmia)
library(e1071)

client <- getAlgorithmiaClient()

read_data <- function(file_path) {
  # Use data api to process data passed in as user input
  csv_file <- client$file(file_path)$getFile()
  csv_data <- read.csv(csv_file,  stringsAsFactors=FALSE, check.names=FALSE, header=TRUE)
  return(csv_data)
}

load_model <- function() {
    # Load model that was saved as .rds file from data collections
    file_path <- "data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds"
    rds_file <- client$file(file_path)$getFile()
    loaded_model <- readRDS(rds_file, refhook = NULL)
    return(loaded_model)
}

# Load model outside of algorithm function - this way after the model is first
# loaded, subsequent calls will be much faster
model <- load_model()

prediction <- function(data) {
    # Using pre-trained Naive Bayes model make predictions on user data
    iris_pred_naive <- predict(model, data)
    return(iris_pred_naive)
}

# API calls will begin at the algorithm() method, with the request body passed as 'input'
# For more details, see algorithmia.com/developers/algorithm-development/languages
algorithm <- function(input) {
    example_data <- read_data(input)
    predictions <- prediction(example_data)
    return(predictions)
}

The code example above shows how to load our model client$file(file_path)$getFile() that we hosted in Data Collections via our Data API.

Note you always want to initialize the model outside of the algorithm function. This way, after the model is initially loaded, subsequent calls will be much faster within that session.

Add Dependencies

  1. Click the “Dependencies” button in the grey navbar.

As you can see in the dependency file above, there are four different ways you can load packages in R.

If you want the latest version from CRAN, then you simply type in the dependency name:

e1071

If you wanted that same package in an older version, all you have to do is install from CRAN the exact version by finding it in the packages archive which should be listed in the packages docs: found under “Old Sources”. There you’ll find the archived packages, so simply choose your version and load your dependency:

https://cran.r-project.org/src/contrib/Archive/e1071/e1071_1.6-6.tar.gz

Or, if you need to pull a dependency off of GitHub instead, all you need to do is install the package with:

-g /cran/e1071

Similar to how you would install through dev.tools() in R.

And finally, if you’re having issues with version conflicts, for instance, package A requires a version of package B while package C requires a different version of package B, then you can install with in your dependency file using install.packages():

-e install.packages(“e1071”)

Note the last format will take longer to load the dependencies than the other formats.

Compile Code

  1. Click the “Compile” button in the top right of the grey navbar
  2. Now test your code in the console by passing in the data file we stored in our data collection.
  3. REMEMBER: Change YOUR_USERNAME to your own name in the model path on line 15 of the code example.
  4. Click “Compile” which will provide you with a hash version of your algorithm that you can use to call and test your algorithm via our CLI or in our case, the IDE.

In this case, we simply passed in a string, but we recommend to create a more robust data structure such as an R list or Python dictionary. That way you can allow for various input types, output files, and other customizations. 

Because every algorithm is backed by a Git repository, you have a hash version while developing your algorithm that you can use to, but you’ll get a semantic version once you publish it.

Pass in the test file that we uploaded to our data collection:

"data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

Then you’ll see the output, which is an array of Iris species names which classified our test data set.

["setosa","setosa","versicolor",...]

Finally, you can publish your algorithm by clicking on the “Publish” button on the top right of your screen.

This will open the modal that takes you through the workflow of publishing your model:

publish algorithm modal

First, you’ll notice your commit history, and you can write any release notes that make sense to include.

Once you’re done with the first tab, you can click “Next” and look at the Sample I/O which is an important piece of your algorithm. For users to consume your model, you’ll need to supply a sample input for them which will become a runable example on your algorithm’s description page. That way users can use your sample data or their own to test on your model to see if it will work with their use case and data. Even if you’re the only one consuming your model, it’s important to document for your future self!

Finally, click on the last tab called “Versioning” will let you set the costs for how much you want to charge for your algorithm, whether it’s public or private, and if it’s a breaking change or a minor revision.

And that’s it! Once you click “Publish” you’ve just deployed your first R model to production on Algorithmia.

If you haven’t installed the Algorithmia library from CRAN on your local machine, do that now. If you need help check out the R Client Guides.

Now, let’s call our model via the API:

library(algorithmia)

input <- "data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

client <- getAlgorithmiaClient(YOUR_API_KEY)

# Change the algo path to yours under your name or team.

algo <- client$algo("test_org/naive_bayes_iris/0.1.0")

result <- algo$pipe(input)$result

print(result)

When you run your algorithm, you’ll get the same result as you saw in the IDE while testing!

That’s it for deploying and calling your R model.

Resources:

 

Exploring the Deep Learning Framework PyTorch

Anyone who is interested in deep learning has likely gotten their hands dirty at some point playing around with Tensorflow, Google’s open source deep learning framework. Tensorflow has a lot of benefits like wide-scale adoption, deployment on mobile, and support for distributed computing, but it also has a somewhat challenging learning curve, and is difficult to debug. It also doesn’t support variable input lengths and shapes due to its static graph architecture unless you use external packages. PyTorch is a new deep learning framework that solves a lot of those problems.

PyTorch is only in beta, but users are rapidly adopting this modular deep learning framework. PyTorch supports tensor computation and dynamic computation graphs that allow you to change how the network behaves on the fly unlike static graphs that are used in frameworks such as Tensorflow. PyTorch also offers modularity, which enhances the ability to debug or see within the network. For many, PyTorch is more intuitive to learn than Tensorflow.

This talk will objectively look at PyTorch and why it might be the best fit for your deep learning use case. We’ll look at use cases that will showcase why you might want consider using Tensorflow instead.

Read More…

Investigating User Experience with Natural Language Analysis

User experience and customer support are integral to every company’s success. But it’s not easy to understand what users are thinking or how they are feeling, even when you read every single user message that comes in through feedback forms or customer support software. With Natural Language Processing and Machine Learning techniques it becomes somewhat easier to understand trends in user sentiment, main topics discussed, and detect anomalies in user message data.

A couple of weeks ago, we gave a talk about investigating user experience with natural language analysis at Sentiment Symposium and thought we’d share the talk, along with the speaker notes for anyone who is interested.

Read More…

Introduction to Time Series

Whether you’re a scientist analyzing earthquake data to predict the next “big one”, or are in healthcare analyzing patient wait times to better staff your ER, understanding time series data is crucial to making better, data informed decisions.

This gentle introduction to time series will help you understand the components that make up a series such as trend, noise, and seasonality. It will also cover how to remove some of these components and give you an understanding on why you would want to. Some common statistical and machine learning models for forecasting and anomaly detection will be explained and we’ll briefly dive into how neural networks can provide better results for some types of analysis. Read More…

Racial Bias in Facial Recognition Software

Binary woman

We’ve all heard about racial bias in artificial intelligence via the media, whether it’s found in recidivism software or object detection that mislabels African American people as Gorillas. Due to the increase in the media attention, people have grown more aware that implicit bias occurring in people can affect the AI systems we build.

Early this week, I was honored to give a talk on Racial Bias in Facial Recognition at PyCascades, a new regional Python conference. Last week I wrote a blog post on learning facial recognition through OpenFace where I went into deeper detail about both facial recognition and the OpenFace architecture, so if you want to give that a read through before checking out this talk, I highly encourage it. Read More…