Algorithmia Blog - Deploying AI at scale

Deploying R Models Into Production

The language you use to create a model can make it easier or harder to deploy into production. While the R programming language makes it easy to compare different models against each other and makes data exploration and visualization a breeze using R Studio, it’s not always the easiest language to deploy into production due to some complex versioning, with different packages installing and relying on different versions of the same dependency.

Luckily, Algorithmia has your back for working with R in production.

Here is a tutorial to show how to deploy a simple Iris classification model using a Naive Bayes classifier and along the way, we’ll discuss the different ways to load dependencies in R models. To get all the necessary files so you can follow along, here is the repository for this tutorial and for more information on deploying models in R and general algorithm creation in R check out the Algorithm Development Guides. Note that we support full R language and standard library version 3.4.

Upload Your Data To Data Collections

In this demo, we are going to host our data on the Algorithmia platform in Data Collections.

You’ll want to create a data collection to host your saved model and your test data:

  1. Login to your Algorithmia account and click your avatar which will show a dropdown of choices. Click “Manage Data”
  2. Then in the left panel on the page of data collection options, go ahead and click “My Hosted Data”
  3. Click on “Add Collection” under the “My Collections” section on your data collections page. Let’s name ours “iris_r_demo”.
  4. After you create your collection you can set the read and write access on your data collection. We are going to select “Private” since only you will be calling your algorithm in this instance. For more information on ACL permission types for Algorithmia hosted data check out the docs.
  5. Now, let’s put some data into your newly created data collection. You can either drag and drop the file naive_bayes_iris.rds or you can click “Drop files here to upload” from where you stored the repo on your computer.

Note, that you can also upload your model or data files from Dropbox or Amazon S3 using the Data API

Create Your Algorithm

Now we are ready to deploy our model.

First, create an algorithm

  1. Click the “Plus” icon at the top right of the navbar.
  2. Check out Getting Started in algorithm development to learn about the various permissions in the form. Do note that if you want to delete your algorithm later, you should set it to “Private”. Make sure you’ve gone through the Getting Started section to cover any questions you have about creating your algorithm.
  3. Click on the purple “Create Algorithm”.

Now that you have created your algorithm, you’ll get a modal with information about using the CLI and Git. Every algorithm has a Git repo behind it so you can experiment with different I/O in development mode by calling the hash version.

Add Code Sample

  1. Click on the tab “Source” and you’ll notice boilerplate code for Hello World.
  2. Let’s delete that code, and copy and paste the code from the file demo.R
  3. Note that you’ll need to change the name of the data collection path to the one we created earlier.

Recall our data collection is called “iris_r_demo” and you’ll need to change “YOUR_USERNAME” to your own username:

file_path = 'data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds'
library(algorithmia)
library(e1071)

client <- getAlgorithmiaClient()

read_data <- function(file_path) {
  # Use data api to process data passed in as user input
  csv_file <- client$file(file_path)$getFile()
  csv_data <- read.csv(csv_file,  stringsAsFactors=FALSE, check.names=FALSE, header=TRUE)
  return(csv_data)
}

load_model <- function() {
    # Load model that was saved as .rds file from data collections
    file_path <- "data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds"
    rds_file <- client$file(file_path)$getFile()
    loaded_model <- readRDS(rds_file, refhook = NULL)
    return(loaded_model)
}

# Load model outside of algorithm function - this way after the model is first
# loaded, subsequent calls will be much faster
model <- load_model()

prediction <- function(data) {
    # Using pre-trained Naive Bayes model make predictions on user data
    iris_pred_naive <- predict(model, data)
    return(iris_pred_naive)
}

# API calls will begin at the algorithm() method, with the request body passed as 'input'
# For more details, see algorithmia.com/developers/algorithm-development/languages
algorithm <- function(input) {
    example_data <- read_data(input)
    predictions <- prediction(example_data)
    return(predictions)
}

The code example above shows how to load our model client$file(file_path)$getFile() that we hosted in Data Collections via our Data API.

Note you always want to initialize the model outside of the algorithm function. This way, after the model is initially loaded, subsequent calls will be much faster within that session.

Add Dependencies

  1. Click the “Dependencies” button in the grey navbar.

As you can see in the dependency file above, there are four different ways you can load packages in R.

If you want the latest version from CRAN, then you simply type in the dependency name:

e1071

If you wanted that same package in an older version, all you have to do is install from CRAN the exact version by finding it in the packages archive which should be listed in the packages docs: found under “Old Sources”. There you’ll find the archived packages, so simply choose your version and load your dependency:

https://cran.r-project.org/src/contrib/Archive/e1071/e1071_1.6-6.tar.gz

Or, if you need to pull a dependency off of GitHub instead, all you need to do is install the package with:

-g /cran/e1071

Similar to how you would install through dev.tools() in R.

And finally, if you’re having issues with version conflicts, for instance, package A requires a version of package B while package C requires a different version of package B, then you can install with in your dependency file using install.packages():

-e install.packages(“e1071”)

Note the last format will take longer to load the dependencies than the other formats.

Compile Code

  1. Click the “Compile” button in the top right of the grey navbar
  2. Now test your code in the console by passing in the data file we stored in our data collection.
  3. REMEMBER: Change YOUR_USERNAME to your own name in the model path on line 15 of the code example.
  4. Click “Compile” which will provide you with a hash version of your algorithm that you can use to call and test your algorithm via our CLI or in our case, the IDE.

In this case, we simply passed in a string, but we recommend to create a more robust data structure such as an R list or Python dictionary. That way you can allow for various input types, output files, and other customizations. 

Because every algorithm is backed by a Git repository, you have a hash version while developing your algorithm that you can use to, but you’ll get a semantic version once you publish it.

Pass in the test file that we uploaded to our data collection:

"data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

Then you’ll see the output, which is an array of Iris species names which classified our test data set.

["setosa","setosa","versicolor",...]

Finally, you can publish your algorithm by clicking on the “Publish” button on the top right of your screen.

This will open the modal that takes you through the workflow of publishing your model:

publish algorithm modal

First, you’ll notice your commit history, and you can write any release notes that make sense to include.

Once you’re done with the first tab, you can click “Next” and look at the Sample I/O which is an important piece of your algorithm. For users to consume your model, you’ll need to supply a sample input for them which will become a runable example on your algorithm’s description page. That way users can use your sample data or their own to test on your model to see if it will work with their use case and data. Even if you’re the only one consuming your model, it’s important to document for your future self!

Finally, click on the last tab called “Versioning” will let you set the costs for how much you want to charge for your algorithm, whether it’s public or private, and if it’s a breaking change or a minor revision.

And that’s it! Once you click “Publish” you’ve just deployed your first R model to production on Algorithmia.

If you haven’t installed the Algorithmia library from CRAN on your local machine, do that now. If you need help check out the R Client Guides.

Now, let’s call our model via the API:

library(algorithmia)

input <- "data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

client <- getAlgorithmiaClient(YOUR_API_KEY)

# Change the algo path to yours under your name or team.

algo <- client$algo("test_org/naive_bayes_iris/0.1.0")

result <- algo$pipe(input)$result

print(result)

When you run your algorithm, you’ll get the same result as you saw in the IDE while testing!

That’s it for deploying and calling your R model.

Resources: