Algorithmia

Train a Face Recognition Model to Recognize Celebrities

Sam Trammell and Rustina Wesley

Sam Trammell and Rustina Wesley from True Blood

Earlier this week we introduced Face Recognition, a trainable model that is hosted on Algorithmia. This model enables you to train images of people that you want the model to recognize and then you can pass in unseen images to the model to get a prediction score.

The great thing about this algorithm is that you don’t have to have a huge dataset to get a high accuracy on the prediction scores of unseen images. The Face Recognition algorithm trains your data quickly using at least ten images of each person that you wish to train on.

For this recipe, we thought it would be fun to train a model on cast members from the TV show True Blood that was cancelled around four years ago. This show was chosen because of the fairly high diversity of cast members in one show, and the ease of extracting images of those celebrities from IMDb.*

One of the major complaints about most face recognition models is that they are trained on homogeneous datasets that aren’t representative of the diversity that exists in real life. With this model you can train your own datasets to create facial recognition applications using the images of your choice.

The focus of this recipe is to show you how to use the Face Recognition algorithm to train the model and then pass in your own images to see what celebrity from True Blood you look most alike.

While this recipe shows a fun way to use the Face Recognition algorithm, algorithms similar to it are being used to help doctors detect genetic conditions such as Down Syndrome in children. Face Recognition algorithms can also be found in image apps, such as security and dating applications. These tools automatically tag photos of you once the model learns what you look like.

Step 1: Install the Algorithmia Client

This tutorial is in Python. But, it could be built using any of the supported clients, like Scala, Ruby, Java, Node and others. Here’s the Python client guide for more information on using the Algorithmia API.

Install the Algorithmia client from PyPi:

pip install algorithmia

You’ll also need a free Algorithmia account, which includes 5,000 free credits a month – more than enough to get started.

Sign up here, and then grab your API key.

Step 2: Retrieve and Label Images for Training Set

Like most algorithms on the Algorithmia platform, Face Recognition takes input in the form of a JSON object. There are a few more parameters for Face Recognition than shown here, so be sure to check out the algorithm description page for more information.

Before we get started: note that for the first function to work, you’ll need to label your images with the first name of the actor or actress separated by an underscore and followed by an underscore and the image number. For instance the actress Anna Paquin would be “Anna_Paquin_1.img”. If your images don’t have numbers or are labeled differently then you can change the script to fit how your images are labeled, just note that the images will need to be saved with the corresponding celebrity name because we will use that to label our images for training.

To get started we’ll get images that we stored in a data collection on Algorithmia via the Data API. You can also store your image files in Dropbox or Amazon S3. To find out more about working with data check out our Developer Docs.

import Algorithmia

client = Algorithmia.client("YOUR_API_KEY")

def get_images():
    """Create labeled dataset from data collection."""
    image_dir = client.dir("data://your_username/your_data_collection/")
    images = []
    # Retrieve images from data collection.
    for file in image_dir.list():
        if image_dir.exists():
            path = file.path.split('_')
            first_name = file.path.split('_')[2].split('/')[1]
            last_name = path[3]
            # Create label from image name.
            label = first_name + " " + last_name
            # Label image based on celebrity.
            images.append(
                {"url": "data://{0}".format(file.path), "person": label})
        else:
            image_dir.create()
    return images

Notice in the function above that we are simply extracting the first and last name of the celebrity and then creating a dictionary for each image that contains the file path for that image and the full name of the person in the image.

Here is a sample of the output:

{'url': 'data://quality/train_face_detection/Alexander_Skarsgard_5.jpg', 'person': 'Alexander Skarsgard'}, {'url': 'data://quality/train_face_detection/Nelsan_Ellis_2.jpg', 'person': 'Nelsan Ellis'}, {'url': 'data://quality/train_face_detection/Ryan_Kwanten_8.jpg', 'person': 'Ryan Kwanten'}, {'url': 'data://quality/train_face_detection/Joe_Manganiello_3.jpg', 'person': 'Joe Manganiello'}, {'url': 'data://quality/train_face_detection/Tamlyn_Tomita_1.jpg', 'person': 'Tamlyn Tomita'}

Step 3: Train the Facial Recognition Model

Now we’ll train our model on the images you uploaded to your data collection, Dropbox, or Amazon S3 bucket. If you are new to machine learning in general, then check out the introduction post Introduction to Machine Learning for Developers.

def train_images():
    """Train images from celebrity pictures."""
    images = get_images()
    input = {"action": "add_images",
             "data_collection": "CelebClassifiers",
             "name_space": "WWCelebrities",
             "images": images
             }
    algo = client.algo('cv/FaceRecognition/0.2.0')
    print(algo.pipe(input))

The above code is straightforward. First we get our list of dictionaries from our previous function and then pass that data into the images field of our input.

Note that the action key contains the value add_images since we are adding images to our model. You can also remove images, list the images used for training and list the people labeled in the training.

The data_collection field should contain the name of your data collection that you want the model to be saved in and the name_space value should be whatever name you choose to differentiate your trained models from one another.

Finally we’ll call the algorithm and pipe our input into it. Notice the version number appended to the end of the algorithm name. It’s always a good idea to add the version number to ensure that your script works the way you expect it to versus pulling down the newest version of the algorithm.

This part of the process will only return either success or throw an error if something is wrong with your input.

Here is what you should see:

AlgoResponse(result={'result': 'success'},metadata=Metadata(content_type='json',duration=261.710619925,stdout=None))

Step 4: Get your Predictions for New Photos

Here is where you will pass in new photos to test the model’s accuracy by using images that weren’t used in the training set of the same celebrities. While we trained our model on 16 cast members, below shows predicting the unseen images of just two celebrities in the images list.

def get_predicted():
    """Predict unseen images as a celebrity the model was trained on."""
    input = {
        "name_space": "WWCelebrities",
        "data_collection": "CelebClassifiers",
        "action": "predict",
        "images": [
            {
                "url": "data://your_username/your_data_collection/Anna_Paquin_test.jpg",
                "output": "data://your_username/your_data_collection/Anna_Paquin_test_output.jpg"
            }
        ]
    }
    algo = client.algo('cv/FaceRecognition/0.2.0')
    print(algo.pipe(input))

Here we are using our trained model to make predictions on new images using the same name_space and data_collection as we used earlier. Note this time we are passing in the value predict for the action key.

In the images list we have dictionaries that contain the image path that we want to use our trained model to find predictions on and the path for where we want to store the predicted images that will contain the bounding boxes once our model has processed them. Those keys are url and output respectively.

Here is some sample output based on training the Face Recognition model on 8 female and 8 male cast members images from True Blood.

 {'predictions': [{'confidence': 0.5970923336136092, 'bb': {'left': 633, 'right': 954, 'bottom': 527, 'top': 206}, 'output': 'data://quality/test_face_detection/Anna_Paquin_test_output.jpg', 'person': 'Anna Paquin'}], 'output': 'data://quality/test_face_detection/Anna_Paquin_test_output.jpg', 'url': 'data://quality/test_face_detection/Anna_Paquin_test.jpg'}

Here is the image produced:

Image with bounding box

Now we thought it would be fun to see what the classifier would think we looked most like out of the 16 actors and actresses that we trained our model to recognize!

We simply swapped the previous step of testing the accuracy of the model on unseen images of the same celebrities we trained on with images of some of our team members.

Here are some results:

Face Recognition Results (True Blood Characters)

Now it’s your turn to try out the Face Recognition algorithm and let us know what you think @Algorithmia.

References:

For ease of use check out the full code here or on GitHub:

import Algorithmia

client = Algorithmia.client("YOUR_API_KEY")

def get_images():
    """Create labeled dataset from data collection."""
    image_dir = client.dir("data://your_username/your_data_collection/")
    images = []
    # Retrieve images from data collection.
    for file in image_dir.list():
        if image_dir.exists():
            path = file.path.split('_')
            first_name = file.path.split('_')[2].split('/')[1]
            last_name = path[3]
            # Create label from image name.
            label = first_name + " " + last_name
            # Label image based on celebrity.
            images.append(
                {"url": "data://{0}".format(file.path), "person": label})
        else:
            image_dir.create()
    return images

def facial_recognition_algorithm(input):
    """Call Face Recognition algorithm and pipe input in."""
    algo = client.algo('cv/FaceRecognition/0.2.0')
    return algo.pipe(input)

def train_images():
    """Train images from celebrity pictures."""
    images = get_images()
    input = {"action": "add_images",
             "data_collection": "CelebClassifiers",
             "name_space": "WWCelebrities",
             "images": images
             }

    return facial_recognition_algorithm(input)

def model_predictions():
    """Predict unseen images as a celebrity the model was trained on."""
    input = {
        "name_space": "WWCelebrities",
        "data_collection": "CelebClassifiers",
        "action": "predict",
        "images": [
            {
                "url": "data://your_username/your_data_collection/Anna_Paquin_test.jpg",
                "output": "data://your_username/your_data_collection/Anna_Paquin_test_output.jpg"
            },

            {
                "url": "data://your_username/your_data_collection/Diego_Oppenheimer_1.jpg",
                "output": "data://your_username/your_data_collection/Diego_Oppenheimer_output.jpg"
            },
            {
                "url": "data://your_username/your_data_collection/Kenny_Daniel_1.jpg",
                "output": "data://your_username/your_data_collection/Kenny_Daniel_output.jpg"
            },
            {
                "url": "data://your_username/your_data_collection/Jon_Peck_1.jpg",
                "output": "data://your_username/your_data_collection/Jon_Peck_output.jpg"
            }
        ]
    }
    return facial_recognition_algorithm(input)


if __name__ == "__main__":
    train_images()
    model_predictions()

 


* In the middle of writing this post, the sad news of the death of actor Nelsan Ellis who played everyone’s favorite character Lafayette on True Blood was announced. We were all big fans of Nelsan Ellis and his character on True Blood and join others in mourning his passing.