If there’s one thing the internet is good for, it’s racy material. This is a headache for a number of reasons, including a) it tends to show up where you’d rather it wouldn’t, like forums, social media, etc. and b) while humans generally know it when we see it, computers don’t so much. We here at Algorithmia decided to give this one a shot. Try out our demo at isitnude.com.
We based our algorithm on nude.py by Hideo Hattori, and on An Algorithm For Nudity Detection (R. Ap-apid, De La Salle University). The original solution relies on finding which parts of the image are skin patches, and then doing a number of checks on the relative geometry and size of these patches to figure out if the image is racy or not. This is a good place to start but isn’t always that accurate, since there are plenty of images with lots of skin that are perfectly innocent – think a closeup of a hand or face or someone wearing a beige dress. You might say that leaning too much on just color leaves the method, well, tone-deaf. To do better, you need to combine skin detection with other tricks.
Facing the problem
Our contribution to this problem is to detect other features in the image and using these to make the previous method more fine-grained. First, we get a better estimate of skin tone using a combination of what we learned from Human Computer Interaction Using Hand Gestures and what we can get from the image using a combination of OpenCV’s nose detection algorithm and face detection algorithm, both of which are available on Algorithmia. Specifically, we find bounding polygons for the face and for the nose, if we can get them, then get our skin tone ranges from a patch inside the former and outside the latter. This helps limit false positives. Face detection also allows us to detect the presence of multiple people in an image, thus customizing skin tones for each individual and re-adjusting thresholds so that the presence of multiple people in the image does not cause false positives. If you want to integrate this into your own application, you can find the magic right here.
This is an improvement, but it is hardly a final answer to the problem. There are countless techniques that can be used in place of or combined with the ones we’ve used to make an even better solution. For instance, you could train a convolutional neural network on problematic images, much like we did for Digit Recognition. Algorithmia was designed to enable this kind of high-level tinkering – we hope you’ll visit us soon and give it a try.