Machine Learning with Humans in the Loop

TL;DR The most accurate machine learning systems to date are those that use a “human-in-the-loop” computing paradigm.

Though we have seen huge advances in the quality and accuracy of pure machine-driven systems, they tend to fall short of acceptable accuracy rates. The combination of machine-driven classification enhanced by human correction, on the other hand, provides a clear path forward in acceptable accuracy. Below we will describe a real-world use case of building and scaling these type of systems.

You can access the live demo here.

What does “human-in-the-loop” mean?

Evidence suggests that a variant of Pareto’s famous 80:20 leads to some of the most accurate machine learning systems to date, with 80% computer AI-driven, 19% human input, and 1 % unknown randomness to balance things out.

Human input can come in two different forms. A) Helping label the original dataset that will be fed into a machine learning model, and B) Helping correct inaccurate predictions that arise as the system goes live.

The team at Algorithmia recently worked with two companies to build a human-in-the-loop deep learning workflow to better train a fashion classifier and deploy it at scale so that a mobile application with hundreds of thousands of users could use it live. The intention being, to use a trained crowd or relevant human population to correct inaccuracies in pure machine predictions, increasing the accuracy of results which directly results in higher engagement for a fashion based mobile application.

The problem

Tizkka, a fashion and lifestyle platform, was looking to improve their recommendations for “looks” – clothes that can be paired together to create an outfit.  They approached Algorithmia for help using image classification techniques to increase their “looks you might like” recommendations engine. “Looks you might like” is their #1 driver of engagement driver and extremely important to their rapidly growing community.

How it works 

Tizkka provided Algorithmia with 60k user-submitted images to train a classifier on 47 different labels.

Algorithmia partnered with Mighty AI, a premier Training Data as a Service company, to do an initial labeling of the data set.

We decided on three levels of labels: Clothing > Bottoms > Jeans, as well as the bounding boxes for the labeled object.

We had a labelled dataset in a couple of days after we sent the data to Mighty AI’s army of taggers.

Our research team chose to use Faster R-CNN (based on Ren,He,Girshick,Sun arXiv: 1506.01497v3, Jan 2016) because of its high accuracy rates and how recent the research was. This paper is the third generation of the architecture R-CNN.

Given the size of the data set and our desire to run many experiments in parallel, we chose to use our local deep learning training box. With four Titan-Xs we could run four experiments in parallel. Each iteration of the model took about 12 hrs and after 4 days we had our first promising results.

The architecture of Faster R-CNN does not allow for “online” training. These means the entire model needs to be retrained when new data points are added. This is simple enough to do and Algorithmia easily allows for versioning the model files and API endpoints that access the Caffee based model (for more on how to host your Caffe model on Algorithmia go here).

A nightly cron job triggers “offline” training based on the new original data set + newly tagged samples and a new version of the algorithm and model can be published. Once the algorithm is republished it is automatically available to be put in production and to scale to the very large community Tizkka engages on a daily basis.

By adding the right humans-in-the-loop, we allow the system to actively learn and correct itself from classifications it got incorrectly. With every iteration (and the more data classified) the classification model gets more accurate and this loop could even be built into Tizkka’s app directly having their users help correct their classifications.


At Algorithmia, we’ve had the fortune of designing and building many production ready machine learning systems. It’s that knowledge that allows us to confidently say that the best results are achieved with well thought out “human-in-the-loop” systems.

There are three core components to a successful production ready ML system: good data, human annotation/correction and core infrastructure to scale and redeploy regularly. With these three components in place, your organization has the best chances of producing results that will delight your customers and provide the accuracy needed to move ML from the lab to daily operations.

About Algorithmia
Algorithmia helps developers build intelligent applications by providing a common API for algorithms, functions and models that are accessible and discoverable to anyone.

About Mighty AI
Mighty AI enables machine learning teams to generate accurate and diverse annotations on their datasets to train, validate, and test their algorithms.

About Tizkka
TiZKKA is a social fashion app that helps the user dress better and grow their fashion know-how, on their path to becoming a fashionista.

Diego Oppenheimer, founder and CEO of Algorithmia.

More Posts - Website

Follow Me: