Deploying machine learning models at scale is one of the most pressing challenges faced by the community of data scientists today, and as ML models get more complex, it’s only getting harder. The most common way machine learning gets deployed today is on powerpoint slides.
We estimate that fewer than 5 percent of commercial data science projects make it to production. If you want to be part of that share, you need to understand how deployment works, why machine learning is a unique deployment problem, and how to navigate this messy ecosystem.
What is Machine Learning Deployment? The Meaning of Scale
To understand model deployment, you need to understand the difference between writing software and writing software for scale. If you want to write a program that just works for you, it’s pretty easy; you can write code on your computer, and then run it whenever you want. But if you want that software to be able to work for other people across the globe? Well that’s a bit harder.
Software done at scale means that your program or application works for many people, in many locations, and at a reasonable speed. This is very different than writing software locally: in fact, it’s just like the difference between having a garage sale and having a multinational e-commerce store online. The logistics of running the latter are a whole other game.
As technology has become more global and the world more connected, developers are required to create these “scalable” applications more and more often. This has led to the dawn of an entirely new field called DevOps––short for development operations––focused on actually scaling these kinds of applications. DevOps is still nebulously defined, but it can be broadly defined as the process of developers collaborating to create scalable, resilient, and distributed systems. In plainspeak, it’s deploying applications that simply just work for everyone who wants to use them.
Scaling applications and creating these distributed systems is really, really tough. There are entire books, courses, and even graduate degrees about the topic, and it’s only growing in complexity with the growth of new paradigms like microservices and containerization. Some frameworks like the Scale Cube (below) try to make this field digestible, but it’s pretty challenging.
The Unique Challenges of Machine Learning Deployment
Deploying regular software applications is hard—but when that software is a machine learning pipeline, it’s worse! Machine learning has a few unique features that makes deploying it at scale harder:
Multiple Data Science Languages
Normal software applications are usually written in one main programming language that’s purpose-built for production systems, like Java or C++. That’s not the case for machine learning. Models are often built in multiple languages that don’t always interact well with each other. For example, it’s not uncommon to have a machine learning pipeline (a combination of data ingestion, cleaning, and modeling) that starts in R, continues in Python, and ends in Scala. Making those all work together is not trivial.
Data Science Languages Can Be Slow
Python and R are the most popular languages for data science and machine learning applications, but complete production models are rarely deployed in those languages for speed reasons. Porting a Python or R model into a production language like C++ or Java is challenging, and often results in reduced performance (speed, prediction, accuracy, etc.) of the original, trained model.
Machine Learning Can Be Extremely Compute Heavy, and Relies on GPUs
Neural nets can often be very deep (the popular VGGnet model is 16-19 layers deep!), which means that training and using them for inference takes up a lot of compute power. When you want your algorithms to run fast, for millions of people, that can be quite the hindrance.
Additionally, most production machine learning today relies on GPUs. They’re been shown to be far faster, more practical, and more efficient for both training and inference (if you have the budget). But GPUs are scarce and expensive, which adds another layer of complexity to the task of scaling a machine learning project.
Machine Learning Compute Works In Spikes
The last (but not least!) quirk of predictive machine learning deployment is that usage is erratic. Once your algorithms are trained, they’re not used consistently––your customers will only call them when they need them. That can mean that you’re only supporting 100 API calls at 8:00 AM, but 1 Million at 8:30 AM. Scaling up and down like that while making sure not to pay for servers you don’t need is a nightmare.
All in all, combining the fact that application deployment at scale is already extremely difficult with the nuances that machine learning adds into the picture leads to a pretty cloudy picture. It’s no wonder that so few data science projects end up actually making it into production systems.
If you’re on the data science team at a large enterprise, this can be even more frustrating. After taking months to write out your (awesome) models, you’re going to need to hand them over to engineering to deploy at scale. That process can take months, and the models you end up with may not at all resemble what you handed them originally. And if you want to make small changes after, or continually improve your models with new data? Forget about it.
What’s An Engineer To Do? Deployment Approaches
The unfortunate answer to all of these looming questions is that there’s no silver bullet. Deploying machine learning models is and will continue to be difficult, and that’s just a reality that organizations are going to need to deal with. Thankfully though, a few new architectures and products are giving developers more hoses to tame this fire.
If You’re A Tech Giant, Don’t Worry About It!
Perhaps the best indicator that machine learning deployment is indeed a pretty hard thing is how bigger tech companies have responded; more than a few have developed their own internal systems for accomplishing this task. Uber’s platform is called Michelangelo (link below), and it allows employees to develop machine learning products easily. Facebook and Google have also developed these kinds of tools, and it’s not hard to imagine that other tech giants have done and will do the same.
Serverless, Containerization, and Microservices
The serverless movement is directed at worrying less about web server side stuff, and it’s a good fit for machine learning deployment. Because of the spike-y nature of machine learning inference, you want to be able to scale your servers instantly, but then shut them down when they’re not being used. Serverless platforms like AWS Lambda make that a possibility.
Potentially more popular than serverless, the trend of containerization is also making it slightly easier to deploy machine learning models at scale. Instead of using one virtual machine per deployment, containers let you deploy multiple individually packaged instances of your software on the same machine. In general, containers are typically easier to share across development teams and lend well towards reproducible results on the training and inference side.
The microservices architecture is also helping make ML deployment more accessible. Remember how different parts of a machine learning pipeline can be in different languages that don’t work well together? Packaging each part of your pipeline as an individual microservice (accessed through a RESTful API) helps break down that barrier. This feature allows your application pieces to interact through API calls, which is more manageable.
The AI Layer: Platform as a Service
This machine learning deployment problem is one of the major reasons that Algorithmia was founded. Understanding machine learning techniques and implementing them is difficult and time-consuming. Our goal is to make it as easy and as simple as possible for anyone to create and deploy machine learning at scale, and our platform does just that. You just upload your algorithm code (through Git or our IDE) and we take care of the rest. The Algorithmia platform works with all major programming languages, and supports indie developers all the way up to Fortune 50 enterprises.
For more information about deploying machine learning models at scale, download our free whitepaper: Deploying Machine Learning Models at Scale with Serverless Microservices.
For more information about microservices, start with our Introduction to Microservices blog post.
How to Train and Deploy Deep Learning at Scale (O’Reilly Podcast) – “In this episode of the Data Show, I spoke with Ameet Talwalkar, assistant professor of machine learning at CMU and co-founder of Determined AI. Talwalkar has spent the last few years grappling with this problem as an academic researcher and as an entrepreneur. In this episode, he describes some of his related work on hyperparameter tuning, systems, and more.”
Lessons Learned From Deploying Deep Learning at Scale (Algorithmia) – ”These slides cover the pros and cons of popular frameworks like TensorFlow, Caffe, Torch, and Theano. Why you should use one framework over another. And, more importantly, once you’ve picked a framework and trained a machine-learning model to solve your problem, how to reliably deploy deep learning frameworks at scale.”
A Guide to Scaling Machine Learning Models in Production (Hackernoon) – “The workflow for building machine learning models often ends at the evaluation stage: you have achieved an acceptable accuracy, and “ta-da! Mission Accomplished.” Beyond that, it might just be sufficient to get those nice-looking graphs for your paper or for your internal documentation. In fact, going the extra mile to put your model into production is not always needed. And even when it is, this task is delegated to a system administrator.”
Meet Michelangelo: Uber’s Machine Learning Platform (Uber) – “Uber Engineering is committed to developing technologies that create seamless, impactful experiences for our customers. We are increasingly investing in artificial intelligence (AI) and machine learning (ML) to fulfill this vision. At Uber, our contribution to this space is Michelangelo, an internal ML-as-a-service platform that democratizes machine learning and makes scaling AI to meet the needs of business as easy as requesting a ride.”
Deploying AI to Production: 12 Tips From The Trenches (sc5) – “Machine learning is, in many ways, a completely different beast than “traditional” software engineering. But in some aspects, it isn’t. Machine learning solutions also need to be deployed to production to be of any use, and with that comes a special set of considerations. Many of these aren’t properly taught in ML/AI courses/programmes, which is both a shame and an oversight. In this post, I’ll try to give some tips on how to avoid common mistakes, based on my own experiences.”