Algorithmia Blog - Deploying AI at scale

Machine learning engineers and data scientists biggest challenge: deploying models at scale

Key Findings in a Survey of 500+ Machine Learning Professionals

Earlier this year we set out to understand how organizations are reacting and adapting to machine learning, its rate of adoption in the marketplace, and how the industry is evolving. We wanted to understand what our customers’ challenges are as Algorithmia plans to develop products, services, and content to help move the industry forward. We heard from over 500 decision makers at companies representing various sizes and industries. We want to share their knowledge with the industry at large.

Get the Full Report “The State of Enterprise Machine Learning”

What we found was a mix of expected and astonishing. First, as we expected, organizations that make a concerted effort to focus on machine learning and artificial intelligence across their customer lifecycle are more successful. These organizations have higher rates of brand loyalty, lower costs of operations, and many other benefits which we will discuss later.

Second, data scientists and machine learning engineers at companies of all sizes find that their number one challenge is deploying models across their infrastructure. This seems at odds with the first finding, considering companies must be able to deploy models in order to reap the rewards. The main problem is that not all enterprises are experts at deploying models, nor do these organizations make a concerted effort to focus on machine learning.

We found that in machine learning and artificial intelligence organizations:

  • Data scientists are facing many roadblocks such as deployment, model control, etc.
  • Companies are increasing their investment in machine learning on average by 25 percent
  • Large enterprises are taking the lead in this initiative
  • Machine learning leadership has no central location within an organization to date; they tend to be spread across the organization
  • There are a broad number of use cases and applied applications for machine learning to date

Data scientists are facing many roadblocks
Most data science and machine learning teams are not able to focus on adding value. Rather, they spend the majority of their time on infrastructure, deployment, and data engineering. This leaves less than 25 percent of their time for training and iterating models, which is their primary job function. Across all organizations we surveyed, only 8 percent of respondents consider their organization “sophisticated” in their machine learning programs. The remainder considered themselves early adopters.

If data scientists cannot focus their time on advancing these systems to become sophisticated, organizations risk being stuck in mediocrity. Budgets are also growing faster for organizations that consider themselves “sophisticated.” 51 percent of these companies have increased their machine learning budgets by at least 25 percent this year.

Companies are quickly increasing their investment in machine learning
Overall, 80 percent of respondents say their organization’s investment in machine learning has grown by at least 25 percent in the past 12 months. What is most interesting is that this number climbs to 92 percent in organizations with greater than 10,000 employees. It is safe to say that organizations of all sizes are accelerating their investment. However, large enterprises seem to be willing to invest more.

Big companies are taking the lead
Employees within larger organizations feel significantly more satisfied with their progress than smaller organizations. The employees in this larger sector are roughly 300 percent more likely to consider their model deployment “sophisticated.” The market is moving quickly to develop tools that will help smaller organizations catch up, but this gap remains for the foreseeable future.

Companies have not decided where machine learning leadership should come from
Overall, 37 percent of respondents say their machine learning efforts are being directed primarily by management, while 55 percent say their efforts are emerging from engineers or other technical teams.

Qualitatively, many data scientists are fighting existing systems and processes without clear understanding from management. Without guidance and goals, this leads to confusion and a lack of organizational management to help companies move beyond these challenges.

One hypothesis for this is that as companies get larger, management begins to set the priorities more. We noted that 33 percent of companies with more than 10,000 employees say management sets priorities, remove roadblocks, and ensure data scientists are free to do their jobs.

Another interesting note is that business roles (management, product management) set priorities more often than technical roles (DevOps, ML engineer, R&D). Data scientists are in the middle.

Companies are trying a wide variety of use cases
Among enterprises of 10,000 employees or more, the most significant use case is increasing customer loyalty (59 percent), followed by increasing customer satisfaction (51 percent), and interacting with customers (48 percent).

In general, larger and more sophisticated companies noted more use cases overall than smaller and less mature companies. Our finding is as companies get better at machine learning, they get smarter about where to focus their efforts, and gain clarity around the results.

For larger organizations, cost savings are increasingly significant: 43 percent of companies between 1,001 and 2,500 employees, 41 percent of companies between 2,501 and 10,000 employees, and 48 percent of companies with more than 10,000 employees put cost savings as a use case.

The goal of this research and blog post is to give people in the industry a baseline understanding of the current maturity of the competitive landscape. The data from our survey shows that companies are rapidly maturing and running into common challenges. We hope this helps you navigate our quickly evolving field.

Would you like to read the report and make your conclusions? Get the Full Report “The State of Enterprise Machine Learning”[Download]

Deploying R Models Into Production

The language you use to create a model can make it easier or harder to deploy into production. While the R programming language makes it easy to compare different models against each other and makes data exploration and visualization a breeze using R Studio, it’s not always the easiest language to deploy into production due to some complex versioning, with different packages installing and relying on different versions of the same dependency.

Luckily, Algorithmia has your back for working with R in production.

Here is a tutorial to show how to deploy a simple Iris classification model using a Naive Bayes classifier and along the way, we’ll discuss the different ways to load dependencies in R models. To get all the necessary files so you can follow along, here is the repository for this tutorial and for more information on deploying models in R and general algorithm creation in R check out the Algorithm Development Guides. Note that we support full R language and standard library version 3.4.

Upload Your Data To Data Collections

In this demo, we are going to host our data on the Algorithmia platform in Data Collections.

You’ll want to create a data collection to host your saved model and your test data:

  1. Login to your Algorithmia account and click your avatar which will show a dropdown of choices. Click “Manage Data”
  2. Then in the left panel on the page of data collection options, go ahead and click “My Hosted Data”
  3. Click on “Add Collection” under the “My Collections” section on your data collections page. Let’s name ours “iris_r_demo”.
  4. After you create your collection you can set the read and write access on your data collection. We are going to select “Private” since only you will be calling your algorithm in this instance. For more information on ACL permission types for Algorithmia hosted data check out the docs.
  5. Now, let’s put some data into your newly created data collection. You can either drag and drop the file naive_bayes_iris.rds or you can click “Drop files here to upload” from where you stored the repo on your computer.

Note, that you can also upload your model or data files from Dropbox or Amazon S3 using the Data API

Create Your Algorithm

Now we are ready to deploy our model.

First, create an algorithm

  1. Click the “Plus” icon at the top right of the navbar.
  2. Check out Getting Started in algorithm development to learn about the various permissions in the form. Do note that if you want to delete your algorithm later, you should set it to “Private”. Make sure you’ve gone through the Getting Started section to cover any questions you have about creating your algorithm.
  3. Click on the purple “Create Algorithm”.

Now that you have created your algorithm, you’ll get a modal with information about using the CLI and Git. Every algorithm has a Git repo behind it so you can experiment with different I/O in development mode by calling the hash version.

Add Code Sample

  1. Click on the tab “Source” and you’ll notice boilerplate code for Hello World.
  2. Let’s delete that code, and copy and paste the code from the file demo.R
  3. Note that you’ll need to change the name of the data collection path to the one we created earlier.

Recall our data collection is called “iris_r_demo” and you’ll need to change “YOUR_USERNAME” to your own username:

file_path = 'data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds'

client <- getAlgorithmiaClient()

read_data <- function(file_path) {
  # Use data api to process data passed in as user input
  csv_file <- client$file(file_path)$getFile()
  csv_data <- read.csv(csv_file,  stringsAsFactors=FALSE, check.names=FALSE, header=TRUE)

load_model <- function() {
    # Load model that was saved as .rds file from data collections
    file_path <- "data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds"
    rds_file <- client$file(file_path)$getFile()
    loaded_model <- readRDS(rds_file, refhook = NULL)

# Load model outside of algorithm function - this way after the model is first
# loaded, subsequent calls will be much faster
model <- load_model()

prediction <- function(data) {
    # Using pre-trained Naive Bayes model make predictions on user data
    iris_pred_naive <- predict(model, data)

# API calls will begin at the algorithm() method, with the request body passed as 'input'
# For more details, see
algorithm <- function(input) {
    example_data <- read_data(input)
    predictions <- prediction(example_data)

The code example above shows how to load our model client$file(file_path)$getFile() that we hosted in Data Collections via our Data API.

Note you always want to initialize the model outside of the algorithm function. This way, after the model is initially loaded, subsequent calls will be much faster within that session.

Add Dependencies

  1. Click the “Dependencies” button in the grey navbar.

As you can see in the dependency file above, there are four different ways you can load packages in R.

If you want the latest version from CRAN, then you simply type in the dependency name:


If you wanted that same package in an older version, all you have to do is install from CRAN the exact version by finding it in the packages archive which should be listed in the packages docs: found under “Old Sources”. There you’ll find the archived packages, so simply choose your version and load your dependency:

Or, if you need to pull a dependency off of GitHub instead, all you need to do is install the package with:

-g /cran/e1071

Similar to how you would install through in R.

And finally, if you’re having issues with version conflicts, for instance, package A requires a version of package B while package C requires a different version of package B, then you can install with in your dependency file using install.packages():

-e install.packages(“e1071”)

Note the last format will take longer to load the dependencies than the other formats.

Compile Code

  1. Click the “Compile” button in the top right of the grey navbar
  2. Now test your code in the console by passing in the data file we stored in our data collection.
  3. REMEMBER: Change YOUR_USERNAME to your own name in the model path on line 15 of the code example.
  4. Click “Compile” which will provide you with a hash version of your algorithm that you can use to call and test your algorithm via our CLI or in our case, the IDE.

In this case, we simply passed in a string, but we recommend to create a more robust data structure such as an R list or Python dictionary. That way you can allow for various input types, output files, and other customizations. 

Because every algorithm is backed by a Git repository, you have a hash version while developing your algorithm that you can use to, but you’ll get a semantic version once you publish it.

Pass in the test file that we uploaded to our data collection:


Then you’ll see the output, which is an array of Iris species names which classified our test data set.


Finally, you can publish your algorithm by clicking on the “Publish” button on the top right of your screen.

This will open the modal that takes you through the workflow of publishing your model:

publish algorithm modal

First, you’ll notice your commit history, and you can write any release notes that make sense to include.

Once you’re done with the first tab, you can click “Next” and look at the Sample I/O which is an important piece of your algorithm. For users to consume your model, you’ll need to supply a sample input for them which will become a runable example on your algorithm’s description page. That way users can use your sample data or their own to test on your model to see if it will work with their use case and data. Even if you’re the only one consuming your model, it’s important to document for your future self!

Finally, click on the last tab called “Versioning” will let you set the costs for how much you want to charge for your algorithm, whether it’s public or private, and if it’s a breaking change or a minor revision.

And that’s it! Once you click “Publish” you’ve just deployed your first R model to production on Algorithmia.

If you haven’t installed the Algorithmia library from CRAN on your local machine, do that now. If you need help check out the R Client Guides.

Now, let’s call our model via the API:


input <- "data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

client <- getAlgorithmiaClient(YOUR_API_KEY)

# Change the algo path to yours under your name or team.

algo <- client$algo("test_org/naive_bayes_iris/0.1.0")

result <- algo$pipe(input)$result


When you run your algorithm, you’ll get the same result as you saw in the IDE while testing!

That’s it for deploying and calling your R model.



How to add AI to your WordPress site

You’ve heard about Artificial Intelligence in everything from airplanes to toasters and you’ve wondered how to get those benefits into your WordPress website. Well, wonder no more! I’ve created a WordPress plugin that integrates with Algorithmia that is easy to extend to any AI algorithm they provide! FYI: Algorithmia does have many platform integrations this post discusses just the WordPress one.

Algorithmia is an algorithm platform with an API you can use to run intelligent algorithms with sometimes astonishing Machine Learning complexity from your own code or website. They provide client libraries for all of the major programming languages (the PHP client written by yours truly!), making it easy to add AI algorithms directly into your own applications. And after I finished the PHP client, the next logical step was to build a WordPress plugin to bring the power of AI to the more than 25% of the internet powered by WordPress!

Read More…

Vertical Spotlight: Machine Learning For Customer Service

Image result for customer service machine learning

Source: PCMag

Customer Service is likely one of the most complex and frustrating parts of your business, but it doesn’t have to be. Machine Learning is making strides in automating and improving parts of the Customer Service (CS) stack quickly, like auto-routing tickets to the right agent or improving your knowledge base. Our Vertical Spotlight on Customer Service will give you all the information you need to get started.

All of our vertical spotlights use our Machine Learning Vertical Framework: we analyze unique use cases, leadership, domain specific problems, and model tradeoffs.

Read More…