While CV algorithms have been around in various forms since the 1960s, it wasn’t until recently that it’s progressed to far more sophisticated levels. In particular, combining computer vision with machine learning has yielded some amazing results.
For example, Facebook has combined computer vision, machine learning, and their massive data set of photos, to obtain highly accurate facial recognition results. That’s how Facebook can suggest who to tag in your photo.
Here’s how this works:
Facebook has a large set of photos from users. Many of them are already tagged, identifying who is in the photo.
Since the photos are labeled, Facebook can then run their computer vision algorithms on these photos. At a very high-level and given enough data, the algorithm can learn to identify a person’s face from the associated tag on the photo. Not only that, but Facebook can also identity objects in the images using the same process
Try it yourself with this Chrome Extension that displays the automated image tags that FB generated for your images.
What Is Computer Vision?
Computer vision can be defined as “the theory and technology for building artificial systems that obtain information from images or multi-dimensional data.”
A simpler explanation is that computer vision strives to solve the same problems you can solve with your very own eyes.
For example, if you’re driving and you see a child run into the road, your brain will quickly interpret the child in the road ahead of you, that it’s dangerous, and that you should immediately brake to avoid hitting the child.
That’s one of the problems self-driving car engineers are currently attempting to solve using computer vision. The approach requires being able to perform object recognition, which can be subdivided into three varieties: object classification, identification, and detection.
Object classification is where you have several previously learned objects that you want to be able to recognize in an image. Classifying a portrait photo as having person’s face in it is an example object classification — you’ve classified that this photo contains a face in it.
Object identification is the recognition of a specific instance of an object. For example, being able to identify that there are two faces in an image and that one is John and the other is Sarah is an example of object identification.
And, lastly, object detection is the ability to identify that there’s an object in an image. This is typically used for things like automatic toll roads where you want to know when a new object has entered the frame so you can take a scan the license plate.
Another example: if you wanted to detect nudity in images or video, you would want to first classify the photo as having a person in it. Then, you’d want to identify their face and sample the skin tones. And, finally, you’d want to detect all the areas of the image with that color to determine if, probabilistically, the person found in the image is naked or not. Learn more about how to detect nudity and NSFW images using computer vision and deep learning.
Connecting this to the self-driving car problem, if you think to how the human brain would solve this problem, it would have to answer the same questions: In order for the situation to be dangerous, we would have to both identify that there is a child (object) in or approaching the road. Identify that the child in the road is something that we should avoid. You would also want to identify other objects, like trash, soccer ball, bike, etc., where you don’t necessarily need evasive action.
Why is Computer Vision Important?
As mentioned earlier, computer vision is being used in the real world for things like self-driving cars and pedestrian detection, but in plenty of other situations as well: face recognition, gesture recognition, optical character recognition, augmented reality, digital video fingerprinting, iris recognition, people counting, reverse image search, and more.
Computer vision is also useful in an industrial context, where CV can be used to detect product defects. Toll bridges are another area that has been disrupted by CV. Rather than stopping to pay a toll, a camera reads the license plate as you drive by and charges your account accordingly.
Computer Vision Algorithms
Algorithms are what make computer vision possible and best for many tasks is currently a convolutional neural network. This is a form of deep learning that attempts to mimic how the brain understands objects in images.
OpenCV is the most popular free and open source solutions for computer vision. OpenCV algorithms range from being able to pixelate faces in images, to being able to smartly crop images automatically, to finding objects in images.
If you want to learn more about computer vision, there are more resources available than ever before.
There is a MOOC course on Udacity — this is a great place to start.
If you are interested in a more traditional university course, Brown has posted their computer vision online curriculum for free.
A great book option is Learning OpenCV by Bradski and Kaehler. It provides a lot of theoretical information, as well as projects with OpenCV. If you’re interested in finding out more about the algorithms involved, Computer Vision: Algorithms and Applications by Richard Szeliski is another option.
- An Introduction to Deep Learning
- An Introduction to Natural Language Processing
- An Introduction to Sentiment Analysis