Algorithmia Blog - Deploying AI at scale

Going to Print—the Cimpress Machine Learning Story

Read the Case Study

Machine learning can automate business processes, but maybe more importantly,
it can improve customer experience—just look at Cimpress.

Cimpress, the parent company of VistaPrint, is one of the foremost aggregators of customized merchandise in the world with more than 10,000 employees spanning multiple continents. It has a mind for ethically and environmentally sustainable product production and has grown rapidly since its inception in 1994, while maintaining its ethos of staying small even as it gets big.

Cimpress integrates ML into its online experience

By 2016, Cimpress was running up against the challenge of deploying its models at
scale—a huge undertaking for any company to integrate into its existing tech infrastructure. The Cimpress team realized the effort required to manually deploy
ML models was slowing them down and started looking for solutions.

Cimpress tested many potential solutions but found Algorithmia’s Serverless AI Layer to be the perfect fit for deploying and managing its models at scale. The AI Layer reduced the number of full-time developers it required to maintain and optimize its systems.

Algorithmia is able to ensure seamless future deployments of machine learning projects for Cimpress without costly or time-intensive rollouts.

The Algorithmia collaboration is accelerating Cimpress’ ability to offer wider customer focus without reducing its commitment to quality and efficiency.

Cimpress was ahead of the curve in understanding core principles of machine learning

Of course, companies should spend time distilling and identifying their core business needs and gaps, like Cimpress did, before looking to incorporate machine learning says Chief Decision Intelligence Engineer at Google and widely published writer about all things AI and machine learning, Cassie Koryzov (Towards Data Science, 2018). An outside firm with expertise in building customized ML infrastructure is often better suited to meet the automation needs than internal developers.

Entrepreneur and former principal data scientist at LinkedIn Peter Skomoroch also calls for using outside experts to build machine learning into business models.

Learn more about Cimpress’ journey into employing Algorithmia’s AI Layer:Read the Case Study

Model Evaluations Beyond Compare

At Algorithmia, we strive first and foremost to meet customer needs, and we’re
releasing a new feature within the AI Layer to help you conduct model comparison.
Model Evaluations is a machine learning tool that lets you create a process for running models concurrently to gauge performance. You can test similar models against one another or compare different versions of one model using criteria you define. Model Evaluations makes comparing machine learning models in a production environment
easy and repeatable.  

If you have ever wanted to know which risk score algorithm is the best for your dataset, Model Evaluations can help. It can test models for accuracy, quality, error rates, drift, or any other performance indicator you specify. Evaluations can be created for an individual user or be organization-owned to enable collaboration across teams. Simply load a new model into the platform and run tests against your own models or those in the marketplace. We plan on making this tool part of the standard UI experience in a future release of Algorithmia Enterprise, but it is available for early access right now

Model Evaluations logo

Comparing models is important because testing and comparing models is an integral part of any development and deployment cycle. Achieve a competitive advantage over other models, build your brand’s credibility, and be certain that new versions outperform previous versions. Other benefits of Model Evaluations:

  • Improve model accuracy and performance
  • Test models before deploying
  • Conduct faster comparisons
  • Get results quicker

Sign up to get early access to our model comparison tool.

To learn more about Model Evaluations, you can find additional documentation, examples, and a step-by-step walkthrough in the Developer Center. But start here with this video we’ve put together demoing the Model Evaluations tool: 

Algorithmia is a leader in the machine learning space, and we care about building
smarter models, so please tell us
 about your experience. We’re eager to hear
your suggestions or ideas!

Model Evaluations will help data scientists compare the quality of different models or even measure the effectiveness of new versions of the same model.

Most Common Use Cases for Enterprise Machine Learning

In part two of our blog series about machine learning in the enterprise, we talk briefly about some of the most common use cases for machine learning. Larger companies produced the widest variety of use cases, however, there was no one single area of focus. Despite such varied answers on where companies were centralizing their attention, we noticed some common trends that we’ll discuss below.

Get the Full Report “The State of Enterprise Machine Learning” here.

Big emphasis on the customer
Among all our respondents, there was clear attention to how machine learning capabilities would help them interact with and retain their customers. Some of the highest selected use cases identified were: generating customer insights and intelligence (#1), improving the customer experience (#2), interacting with customers (#5), increasing customer satisfaction (#6), and retaining customers (#7).

Among the largest companies, the most common use case reported was increasing customer loyalty (59%), followed by increasing customer satisfaction (51%), and interacting with customers (48%). Similarly, among the smallest of responding companies, increasing customer satisfaction (36%) was the second most identified use case behind reducing costs (43%).

Larger organizations are putting significant efforts into using data science to identify areas of cost savings
For larger organizations, cost savings seems to be an increasingly important area of focus. This is due to the fact that it is easy to tie ROI to cost savings programs and showcase success.  43% of companies with 1,001 to 2,500 employees put it as a use case, as well as 41% of companies between 2,501 and 10,000 employees, and 48% of companies with more than 10,000 employees.

The focus on reducing costs is higher among sophisticated adopters
Sophisticated adopters have put the time and effort into developing their machine learning capabilities, with larger companies more likely to do so with greater resources. These larger and more sophisticated companies are investing more across a broader range of use cases. They are also the most focused on how they can use machine learning to reduce costs; 44% mentioned it as one of their use cases.

Early stage adopters are mainly focused on improving their customer retention through the application of machine learning (60%), with the middle stage adopters split between increasing customer loyalty (38%) and a growing interest in reducing costs (39%).

In general, larger and more sophisticated companies filled in more use cases overall than smaller and less mature companies: as you put resources toward and get better at ML, you get smarter about where to apply it and gain clarity on how it can help your business.

With these in mind, how are you utilizing your company’s machine learning capabilities, and how can Algorithmia help?

Get the Full Report “The State of Enterprise Machine Learning” here.

Machine learning engineers and data scientists biggest challenge: deploying models at scale

Key Findings in a Survey of 500+ Machine Learning Professionals

Earlier this year we set out to understand how organizations are reacting and adapting to machine learning, its rate of adoption in the marketplace, and how the industry is evolving. We wanted to understand what our customers’ challenges are as Algorithmia plans to develop products, services, and content to help move the industry forward. We heard from over 500 decision makers at companies representing various sizes and industries. We want to share their knowledge with the industry at large.

Get the Full Report “The State of Enterprise Machine Learning”

What we found was a mix of expected and astonishing. First, as we expected, organizations that make a concerted effort to focus on machine learning and artificial intelligence across their customer lifecycle are more successful. These organizations have higher rates of brand loyalty, lower costs of operations, and many other benefits which we will discuss later.

Second, data scientists and machine learning engineers at companies of all sizes find that their number one challenge is deploying models across their infrastructure. This seems at odds with the first finding, considering companies must be able to deploy models in order to reap the rewards. The main problem is that not all enterprises are experts at deploying models, nor do these organizations make a concerted effort to focus on machine learning.

We found that in machine learning and artificial intelligence organizations:

  • Data scientists are facing many roadblocks such as deployment, model control, etc.
  • Companies are increasing their investment in machine learning on average by 25 percent
  • Large enterprises are taking the lead in this initiative
  • Machine learning leadership has no central location within an organization to date; they tend to be spread across the organization
  • There are a broad number of use cases and applied applications for machine learning to date

Data scientists are facing many roadblocks
Most data science and machine learning teams are not able to focus on adding value. Rather, they spend the majority of their time on infrastructure, deployment, and data engineering. This leaves less than 25 percent of their time for training and iterating models, which is their primary job function. Across all organizations we surveyed, only 8 percent of respondents consider their organization “sophisticated” in their machine learning programs. The remainder considered themselves early adopters.

If data scientists cannot focus their time on advancing these systems to become sophisticated, organizations risk being stuck in mediocrity. Budgets are also growing faster for organizations that consider themselves “sophisticated.” 51 percent of these companies have increased their machine learning budgets by at least 25 percent this year.

Companies are quickly increasing their investment in machine learning
Overall, 80 percent of respondents say their organization’s investment in machine learning has grown by at least 25 percent in the past 12 months. What is most interesting is that this number climbs to 92 percent in organizations with greater than 10,000 employees. It is safe to say that organizations of all sizes are accelerating their investment. However, large enterprises seem to be willing to invest more.

Big companies are taking the lead
Employees within larger organizations feel significantly more satisfied with their progress than smaller organizations. The employees in this larger sector are roughly 300 percent more likely to consider their model deployment “sophisticated.” The market is moving quickly to develop tools that will help smaller organizations catch up, but this gap remains for the foreseeable future.

Companies have not decided where machine learning leadership should come from
Overall, 37 percent of respondents say their machine learning efforts are being directed primarily by management, while 55 percent say their efforts are emerging from engineers or other technical teams.

Qualitatively, many data scientists are fighting existing systems and processes without clear understanding from management. Without guidance and goals, this leads to confusion and a lack of organizational management to help companies move beyond these challenges.

One hypothesis for this is that as companies get larger, management begins to set the priorities more. We noted that 33 percent of companies with more than 10,000 employees say management sets priorities, remove roadblocks, and ensure data scientists are free to do their jobs.

Another interesting note is that business roles (management, product management) set priorities more often than technical roles (DevOps, ML engineer, R&D). Data scientists are in the middle.

Companies are trying a wide variety of use cases
Among enterprises of 10,000 employees or more, the most significant use case is increasing customer loyalty (59 percent), followed by increasing customer satisfaction (51 percent), and interacting with customers (48 percent).

In general, larger and more sophisticated companies noted more use cases overall than smaller and less mature companies. Our finding is as companies get better at machine learning, they get smarter about where to focus their efforts, and gain clarity around the results.

For larger organizations, cost savings are increasingly significant: 43 percent of companies between 1,001 and 2,500 employees, 41 percent of companies between 2,501 and 10,000 employees, and 48 percent of companies with more than 10,000 employees put cost savings as a use case.

Conclusion
The goal of this research and blog post is to give people in the industry a baseline understanding of the current maturity of the competitive landscape. The data from our survey shows that companies are rapidly maturing and running into common challenges. We hope this helps you navigate our quickly evolving field.

Would you like to read the report and make your conclusions? Get the Full Report “The State of Enterprise Machine Learning”[Download]

Deploying R Models Into Production

The language you use to create a model can make it easier or harder to deploy into production. While the R programming language makes it easy to compare different models against each other and makes data exploration and visualization a breeze using R Studio, it’s not always the easiest language to deploy into production due to some complex versioning, with different packages installing and relying on different versions of the same dependency.

Luckily, Algorithmia has your back for working with R in production.

Here is a tutorial to show how to deploy a simple Iris classification model using a Naive Bayes classifier and along the way, we’ll discuss the different ways to load dependencies in R models. To get all the necessary files so you can follow along, here is the repository for this tutorial and for more information on deploying models in R and general algorithm creation in R check out the Algorithm Development Guides. Note that we support full R language and standard library version 3.4.

Upload Your Data To Data Collections

In this demo, we are going to host our data on the Algorithmia platform in Data Collections.

You’ll want to create a data collection to host your saved model and your test data:

  1. Login to your Algorithmia account and click your avatar which will show a dropdown of choices. Click “Manage Data”
  2. Then in the left panel on the page of data collection options, go ahead and click “My Hosted Data”
  3. Click on “Add Collection” under the “My Collections” section on your data collections page. Let’s name ours “iris_r_demo”.
  4. After you create your collection you can set the read and write access on your data collection. We are going to select “Private” since only you will be calling your algorithm in this instance. For more information on ACL permission types for Algorithmia hosted data check out the docs.
  5. Now, let’s put some data into your newly created data collection. You can either drag and drop the file naive_bayes_iris.rds or you can click “Drop files here to upload” from where you stored the repo on your computer.

Note, that you can also upload your model or data files from Dropbox or Amazon S3 using the Data API

Create Your Algorithm

Now we are ready to deploy our model.

First, create an algorithm

  1. Click the “Plus” icon at the top right of the navbar.
  2. Check out Getting Started in algorithm development to learn about the various permissions in the form. Do note that if you want to delete your algorithm later, you should set it to “Private”. Make sure you’ve gone through the Getting Started section to cover any questions you have about creating your algorithm.
  3. Click on the purple “Create Algorithm”.

Now that you have created your algorithm, you’ll get a modal with information about using the CLI and Git. Every algorithm has a Git repo behind it so you can experiment with different I/O in development mode by calling the hash version.

Add Code Sample

  1. Click on the tab “Source” and you’ll notice boilerplate code for Hello World.
  2. Let’s delete that code, and copy and paste the code from the file demo.R
  3. Note that you’ll need to change the name of the data collection path to the one we created earlier.

Recall our data collection is called “iris_r_demo” and you’ll need to change “YOUR_USERNAME” to your own username:

file_path = 'data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds'
library(algorithmia)
library(e1071)

client <- getAlgorithmiaClient()

read_data <- function(file_path) {
  # Use data api to process data passed in as user input
  csv_file <- client$file(file_path)$getFile()
  csv_data <- read.csv(csv_file,  stringsAsFactors=FALSE, check.names=FALSE, header=TRUE)
  return(csv_data)
}

load_model <- function() {
    # Load model that was saved as .rds file from data collections
    file_path <- "data://YOUR_USERNAME/iris_r_demo/naive_bayes_iris.rds"
    rds_file <- client$file(file_path)$getFile()
    loaded_model <- readRDS(rds_file, refhook = NULL)
    return(loaded_model)
}

# Load model outside of algorithm function - this way after the model is first
# loaded, subsequent calls will be much faster
model <- load_model()

prediction <- function(data) {
    # Using pre-trained Naive Bayes model make predictions on user data
    iris_pred_naive <- predict(model, data)
    return(iris_pred_naive)
}

# API calls will begin at the algorithm() method, with the request body passed as 'input'
# For more details, see algorithmia.com/developers/algorithm-development/languages
algorithm <- function(input) {
    example_data <- read_data(input)
    predictions <- prediction(example_data)
    return(predictions)
}

The code example above shows how to load our model client$file(file_path)$getFile() that we hosted in Data Collections via our Data API.

Note you always want to initialize the model outside of the algorithm function. This way, after the model is initially loaded, subsequent calls will be much faster within that session.

Add Dependencies

  1. Click the “Dependencies” button in the grey navbar.

As you can see in the dependency file above, there are four different ways you can load packages in R.

If you want the latest version from CRAN, then you simply type in the dependency name:

e1071

If you wanted that same package in an older version, all you have to do is install from CRAN the exact version by finding it in the packages archive which should be listed in the packages docs: found under “Old Sources”. There you’ll find the archived packages, so simply choose your version and load your dependency:

https://cran.r-project.org/src/contrib/Archive/e1071/e1071_1.6-6.tar.gz

Or, if you need to pull a dependency off of GitHub instead, all you need to do is install the package with:

-g /cran/e1071

Similar to how you would install through dev.tools() in R.

And finally, if you’re having issues with version conflicts, for instance, package A requires a version of package B while package C requires a different version of package B, then you can install with in your dependency file using install.packages():

-e install.packages(“e1071”)

Note the last format will take longer to load the dependencies than the other formats.

Compile Code

  1. Click the “Compile” button in the top right of the grey navbar
  2. Now test your code in the console by passing in the data file we stored in our data collection.
  3. REMEMBER: Change YOUR_USERNAME to your own name in the model path on line 15 of the code example.
  4. Click “Compile” which will provide you with a hash version of your algorithm that you can use to call and test your algorithm via our CLI or in our case, the IDE.

In this case, we simply passed in a string, but we recommend to create a more robust data structure such as an R list or Python dictionary. That way you can allow for various input types, output files, and other customizations. 

Because every algorithm is backed by a Git repository, you have a hash version while developing your algorithm that you can use to, but you’ll get a semantic version once you publish it.

Pass in the test file that we uploaded to our data collection:

"data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

Then you’ll see the output, which is an array of Iris species names which classified our test data set.

["setosa","setosa","versicolor",...]

Finally, you can publish your algorithm by clicking on the “Publish” button on the top right of your screen.

This will open the modal that takes you through the workflow of publishing your model:

publish algorithm modal

First, you’ll notice your commit history, and you can write any release notes that make sense to include.

Once you’re done with the first tab, you can click “Next” and look at the Sample I/O which is an important piece of your algorithm. For users to consume your model, you’ll need to supply a sample input for them which will become a runable example on your algorithm’s description page. That way users can use your sample data or their own to test on your model to see if it will work with their use case and data. Even if you’re the only one consuming your model, it’s important to document for your future self!

Finally, click on the last tab called “Versioning” will let you set the costs for how much you want to charge for your algorithm, whether it’s public or private, and if it’s a breaking change or a minor revision.

And that’s it! Once you click “Publish” you’ve just deployed your first R model to production on Algorithmia.

If you haven’t installed the Algorithmia library from CRAN on your local machine, do that now. If you need help check out the R Client Guides.

Now, let’s call our model via the API:

library(algorithmia)

input <- "data://YOUR_USERNAME/iris_r_demo/iris_test_data.csv"

client <- getAlgorithmiaClient(YOUR_API_KEY)

# Change the algo path to yours under your name or team.

algo <- client$algo("test_org/naive_bayes_iris/0.1.0")

result <- algo$pipe(input)$result

print(result)

When you run your algorithm, you’ll get the same result as you saw in the IDE while testing!

That’s it for deploying and calling your R model.

Resources: