Algorithmia Blog - Deploying AI at scale

Hey Zuck, We Built Your Office A.I. Solution

Office Facial Recognition A.I.

Like many, we were pretty inspired by Mark Zuckerberg’s 2016 personal challenge to build some artificial intelligence tools to help him at home and work. We spend a lot of time at Algorithmia helping developers add algorithmic intelligence to their apps. So, with a hat tip to Zuck, we recently challenged ourselves to see what kind of A.I. solution we could come up with during a recent internal hackathon.

UPDATE (12.29.16): Zuckerberg made good on his 2016 personal challenge in a post where he details how he built an AI assistant to control aspects of his home, like the lights, temperature, appliances, music and security, what he learned over the 100-150 hours spent working on it, and what’s next for his AI assistant.


What We Made


In less than 24-hours, we created an automated front desk A.I. that uses facial recognition to identify and greet our coworkers as they arrive at the office.

We used an Amazon Fire tablet taped to the wall to act as our front desk kiosk, which used the front-facing camera to shoot video. As the user walks up to the tablet, we start sampling frames from the video, which are sent to the CMU OpenFace library to check if there’s a match.

If we’ve seen you before, we welcome you to our office by doing three things: 1) our front desk A.I. announces that you’ve arrived in our Slack channel. 2) Slackbot then sends you a summary of Git commits since the last time you checked in. 3) The office Spotify changes to your favorite song.

If you’re new here, we have you run through a training exercise where you mimic some emojis, and pick your song. The next time you arrive at the office, you’ll be in the system, and ready to go.

The icing on the cake: we built this entirely on Algorithmia, which means we didn’t have to setup or configure servers.


Building The Facial Recognition Service


From the start, our biggest concern was that we needed a facial recognition algorithm that could build an accurate model with as few images as possible, since we didn’t want our users having to train for more than a few seconds. Using the CMU OpenFace library we were able to accomplish this with as few as 10 images, which was perfect for handling our training and facial recognition tasks.

We had just heard about this library, and were eager to test their claim that the update improved recognition accuracy from 76.1% to 92.9% in half the execution time. Although we haven’t done any benchmarking, we were impressed by the anecdotal results, and are looking forward to making the CMU OpenFace library publicly available in the Algorithmia Marketplace as soon as possible. The speed and accuracy could be a game-changer for anybody interested in deep neural network training, and closes the gap from weeks to days for facial recognition.

We created a simple training routine where the user looks at the camera and makes a series of faces. This ensures we capture enough variety of facial expressions for the model. While the user is training, we’re sampling images from the video, labeling them, and getting them ready to process.

Once you’re done making faces, we send them as a batch to the library where it detects faces using OpenCV, and then calculates the position of the face in real time using dlib. It takes about a minute for this entire process to train in the background to be used in the future.

Real-Time Face Pose Estimation

Example of face pose estimation using dlib

We wrapped this entire process in a couple algorithms running on Algorithmia, which operate like microservices. When the user first walks up, the tablet is taking photos of the user, it’s sending them to our FaceClassify algorithm, which continually checks the image with OpenFace to see if we recognize the user. If we recognize you, we send back the UID for the user, and kick off the GetUserData algorithm to retrieve data about you.

The GetUserData algorithm grabs the user’s name and Spotify song URI they selected when they first trained. We then pass the name to the GreeterActions algorithm, which handles both our GitHub and Slack integrations.

Our Greeter Bot for Slack uses an incoming webhook from our app to send a message to the team that somebody has arrived, and are checked in.

Welcome messages for Slack

Greeter Bot welcomes user to the office via Slack

We then grab all the commits from GitHub since you were last in the office, format them, and pass it to our Slack webhook. The webhook handles both sending a direct message to you with the commit summary, as well as announcing that you’ve arrived in our team channel.

Sending GitHub commits into Slack

User receiving all the GitHub commits they’ve missed


Integrating Spotify


 We wanted the process of choosing your office walk-up music to be simple, intuitive, and fun. So we created a choose-your-own adventure flow:

Music Selection

The user first selects a genre of music, and then chooses between “happy,” “dancing,” or “celebration” music. The user is then presented with three songs that matched the genre-mood.

To get the music playing, the first thing we needed to do was create a service that could take requests for tracks and play them. We landed on Pi MusicBox which is a free, headless audio server based on Mopidy for the Rasberry Pi.

Spotify on PiMusicBox

PiMusicBox running Spotify on the Raspberry Pi

Getting Pi MusicBox up and running was straightforward, but we realized that it didn’t have an officially documented API endpoint we could hack on – it’s intended to act more like a replacement for Sonos that let’s you stream music from Spotify, Google Music, SoundCloud, Webradio, Podcasts, and more. So, we had to reverse engineer it.

The first thing we noticed was that all communication was handled with websockets. This controlled the various functions like play, pause, change song, etc. Once we figured out the pattern, it was as easy as setting up another microservice on Algorithmia to pass this information through:

import Algorithmia
import websocket

def apply(input):

ws = websocket.WebSocket()
ws.connect("ws://[AlgoJamzBoxIP]:6680/mopidy/ws")

ws.send('''{"method":"core.tracklist.clear","jsonrpc":"2.0","id":600}''')
print "Got back: '%s'" % ws.recv()
ws.send('''{"method":"core.tracklist.add","params":[null,null,"''' + str(input) +'''"],"jsonrpc":"2.0","id":601}''')
print "Got back: '%s'" % ws.recv()
ws.send('''{"method":"core.playback.play","jsonrpc":"2.0","id":602}''')
print "Got back: '%s'" % ws.recv()

return 'Done'

It’s kind of ugly, but with that figured out, we now have a way of calling the endpoint directly from the tablet using:

var input = <INPUT>;
Algorithmia.client("YOUR_API_KEY")
.algo("algo://jambox/CallTrack/0.1.0")
.pipe(input)
.then(function(response) {
console.log(response.get());
});

This passes in the Spotify URI, connects to the Raspberry Pi, and the song plays on the music in the office stereo.


Conclusion


We’re pleased with how quickly we could create and stack together serverless microservices to power our automated front desk A.I. We’ll be adding the CMU OpenFace library to the platform soon, which will enable all kinds of interesting use cases for app developers – including Zuckerberg’s.

Interested in building your own? Sign-up here to get started with Algorithmia.

We have some cleanup to do on the code before we’re ready to share the sample app with everybody, but in the mean time we built this hack using the following technology:

  • Raspberry Pi
  • PiMusic Box
  • Algorithmia
  • CMU OpenFace
  • Slack
  • Spotify
  • GitHub Pages
  • Node | Angular
  • Ratchet (mobile-first UI based on Bootstrap)
  • Fire Tablet

Thanks for reading!

Product manager at Algorithmia helping to give developers super powers.

More Posts - Website

Follow Me:
TwitterFacebookLinkedIn