Asking a data scientist to work with only one framework is like asking a carpenter to work with only a hammer. It’s essential that professionals have access to all the right tools for the job.
It’s time to rethink best practices for leveraging and building ML infrastructure and set the precedent that data scientists should be able to use whichever tools they need at any time.
Now, certainly some ML frameworks are better suited to solve specific problems or perform specific tasks, and as projects become more complex, being able to work across multiple frameworks and with various tools will be paramount.
For now, machine learning is still in its pioneering days, and though tech behemoths have created novel approaches to ML (Google’s TensorFlow, Amazon’s SageMaker, Uber’s Michelangelo), most ML infrastructure is still immature or inflexible at best, which severely limits data scientists and DevOps. This should change.
Flexible frameworks and ML investment
Most companies don’t have dozens of systems engineers who can devote several years to building and maintaining their own custom ML infrastructure or learning to work within new frameworks, and sometimes, open-source models are only available in a specific framework. This could restrict some ML teams from using them if the models don’t work with their pre-existing infrastructure. Companies can, and should, have the freedom to work concurrently across all frameworks. The benefits of doing so are multifold:
Increase interoperability of ML teams
Often machine learning is conducted in different parts of an organization by data scientists seeking to automate their processes. These silos are not collaborating with other teams doing similar work. Being able to blend teams together while still retaining the merits of their individual work is key. It will de-duplicate efforts as ML work becomes more transparent within an organization.
Allow for vendor flexibility and pipelining
You don’t want to end up locked-in to only one framework or only one cloud provider. The best framework for a specific task today may be overtaken next month or next year by a better product, and businesses should be able to scale and adapt as they grow. Pipelining different frameworks together creates the environment for using the best tools.
Reduce the time from model training to deployment
Data scientists write models in the framework they know best and hand them over to DevOps, who rewrite them to work within their infrastructure. Not only does this usually decrease the quality of a model, it creates a huge iteration delay.
Enable collaboration and prevent wasted resources
If a data scientist is accustomed to PyTorch but her colleague has only used TensorFlow, a platform that supports both people’s work saves times and money. Forcing work to be done with tools that aren’t optimal for a given project is like showing up with knives to a gunfight.
Leverage off-the-shelf products
There’s no need to constantly reinvent the wheel; if an existing open-source service or dataset can do the job, then that becomes the right tool.
Position your team for future innovations
Because the ML story is far from complete, being flexible now will enable a company to pivot more easily come whatever tech developments arise.
How to deploy models in any framework
Attaining framework flexibility, however, is no small feat. The main steps to enable deploying ML models from multiple frameworks are as follows:
It’s fairly simple to run a variety of frameworks on a laptop, but trying to productionize them requires a way to manage all dependencies for running each model, in addition to interfacing with other tools to manage compute.
Containerization and orchestration
Putting a model in a container is straightforward. When companies have only a handful of models, they often task junior engineers with manually containerizing, putting the models into production, and managing scale-up. This process unravels as usage increases, model numbers grow, and as multiple versions of models run in parallel to serve various applications.
Many companies are using Kubernetes to orchestrate containers—there are a variety of open-source projects on components that will do some of the drudge work of machine learning for container orchestration. Teams who have attempted to do this in-house have found that it requires constant maintenance and becomes a Frankenstein of modular components and spaghetti code that falls over when trying to scale. Worse still, after models are in production, you discover that Kubernetes doesn’t deal well with many machine learning use cases.
API creation and management
Handling many frameworks requires a disciplined API design and seamless management practice. When data scientists begin to work faster, a growing portfolio of models with an ever-increasing number of versions needs to be continuously managed, and that can be difficult.
Languages and DevOps
Machine learning has vastly different requirements than traditional compute, including the freedom to support models written in many languages. A problem arises, however, when data scientists working in R or Python sync with DevOps teams who then need to rewrite or wrap the models to work in the language of their current infrastructure.
Down the road
Eventually, every company that wants to extend its capabilities with ML is going to have to choose between enabling a multi-framework solution like the AI Layer or undertaking a massive, ongoing investment in building and maintaining an in-house DevOps platform for machine learning.
Machine Learning is about making predictions. This post will give an introduction to Machine Learning through a problem that most businesses face: predicting customer churn.
ML can help predict which of your customers are at risk for leaving in advance, and give you an edge by pre-empting with action.
Serverless computing is a type of cloud computing in which you only have to pay for the execution time you use, rather than paying for a set amount of usage even if it is underutilized.
Serverless frameworks are making cloud deployment even easier by removing the need to design your own server-side systems. Integrated properly, this paradigm can get your applications out the door faster and free up company resources to build more.
In a nutshell, serverless computing, also called Functions as a Service (FaaS), is a further abstraction on what cloud computing platforms like AWS already do—making it easier than ever to get your applications up and running at scale. Serverless computing takes the power of a hosted cloud to a software level—it abstracts away the entire concept of the server. Instead, you just write functions. The provider takes care of how and where to run those functions, ensuring that you focus on code and not the hardware and systems that operationalize that code.
If you’re trying to create value in your company through machine learning, you need to be using the best hardware for the task. With CPUs, GPUs, ASICs, and TPUs, things can get kind of confusing.
For most of computing history there was only one type of processor. But the growth of deep learning has led to two new entrants into the field: GPUs and ASICs. This post will walk through the different types of compute chips, where they’re available, and which ones are the best to boost your performance.
Source: Frontiers in Psychology
You expect employees to have high levels of emotional intelligence when interacting with customers. Now, thanks to advances in Deep Learning, you’ll soon expect your software to do the same.
Research has shown that over 90 percent of our communication can be non-verbal, but technology has struggled to keep up, and traditional code is generally bad at understanding our intonations and intentions. But emotion recognition—also called Affective Computing—is becoming accessible to more types of developers. This post will walk through the ins-and-outs of determining emotion from data, and a few ways you can get some emotion recognition and running yourself.
What is facial emotion recognition?
Facial emotion recognition is the process of detecting human emotions from facial expressions. The human brain recognizes emotions automatically, and software has now been developed that can recognize emotions as well. This technology is becoming more accurate all the time, and will eventually be able to read emotions as well as our brains do.
AI can detect emotions by learning what each facial expression means and applying that knowledge to the new information presented to it. Emotional artificial intelligence, or emotion AI, is a technology that is capable of reading, imitating, interpreting, and responding to human facial expressions and emotions.
Emotion Detection Use Cases: TSA Screening, Audience Engagement, And More
Source: 21st Century Wire
Understanding contextual emotion has widespread consequences for society and business. In the public sphere, governmental organizations could make good use of the ability to detect emotions like guilt, fear, and uncertainty. It’s not hard to imagine the TSA auto-scanning airline passengers for signs of terrorism, and in the process making the world a safer place.
Companies have also been taking advantage of emotion recognition to drive business outcomes. For the upcoming release of Toy Story 5, Disney plans to use facial recognition to judge the emotional responses of the audience. Apple even released a new feature on the iPhone X called Animoji, where you can get a computer simulated emoji to mimic your facial expressions. It’s not so far off to assume they’ll use those capabilities in other applications soon.
This is all actionable information that organizations and businesses can use to understand their customers and create products that people like. But it’s not exactly a piece of cake to get a product like this working in practice. There are two major issues that have held back meaningful progress in Affective Computing: the training / labeling problem, and the feature engineering problem.
The Training and Labeling Problem
As with any Machine Learning problem, your results are only as good as your data—garbage in means garbage out. Affective computing has a data problem, but it runs deeper than just lacking labeled training data—it’s that we’re not quite sure how to label it in the first place.
Creating an algorithm means we need to understand our inputs and outputs—so what exactly are the human emotions? There are two core approaches that inform how solutions can be designed.
- Categorical – argues that emotions fall into set classes. The pioneer of this approach was a Swedish anatomist named Carl-Herman Hjortsjö, and the idea is simple: there are a finite set of human emotions. A group of scientists led by Paul Ekman later developed the system, called FACS (Facial Action Coding System), and have continually been updating it since then. The emotions are happiness, sadness, surprise, fear, anger, disgust, and contempt.
- Dimensional – assumes that emotions exist on a spectrum, and can’t be defined concretely. The Circumplex model of affect defines two dimensions, pleasure and arousal, while the PAD emotional state model uses three.
Which model of human emotions we accept and work with has important consequences for modeling them with Machine Learning. A categorical model of human emotion would likely lead to creating a classifier, where text or an image would be labeled as happy, sad, angry, or something else. But a dimensional model of emotions is slightly more complex, and our output would need to be on a sliding scale (perhaps a regression problem).
Source: Maria K. Almoite
But even once we pick a model to base emotions on, it’s pretty difficult to get hands on a useful training set. There are only two large-scale sets that are useable for modeling:
- The Cohn-Kanade AU-Coded Expression Database – a lab prepared study or facial expressions and emotions
- The Affectiva-MIT Facial Expression Dataset (AM-FED) – “naturalistic and spontaneous facial responses to Super Bowl ads.” Taken in a natural, non-lab setting.
The labeling on both of these datasets follows the categorical emotion philosophy and uses the FACS coding system.
In general, unlike many other disciplines with research being done on applying Machine Learning, a lot of the work in Affective Computing is being done on understanding the field first. For example, the research project EmotiNet is a “knowledge base” for emotion recognition in text. Much of the fundamental groundwork in understanding human emotions and codifying them is still yet to be done.
The Feature Engineering Problem
Even once we get over the hurdle of choosing a framework for understanding emotion and acquiring well-labeled training data, there’s still another issue before diving into algorithms: nobody is quite sure what the features should be.
In Machine Learning, we use a dataset as an input to predict and create some sort of output. The dataset has features: think of these as the columns in a spreadsheet. For a normal and simple dataset, features might be “inches of rain today” or “number of engagements for a customer.” But when we’re dealing with Affective Computing, there are only 3 possible inputs—text, speech, and image/video—and none follow the traditional data format.
Feature engineering, or deciding what the best possible inputs for our model are, is also a complex issue in Sentiment Analysis, which is the broad parent topic of emotion recognition. It might help, for example, to include whatever the previous sentence was along with the current sentence as an input. Adding that type of context to each data is what feature engineering or feature extraction is all about. For more detail on feature engineering around sentiment analysis, check out our post about the topic here.
For text, the typical data structure used is a Document-Term Matrix. The DTM is basically a matrix that records how many times each word appears in a “document,” which can be defined as anything we want. If we’re analyzing the emotional content of a sentence, the DTM might be some function of the occurrences of each word in the sentence.
The problem with this more traditional data structure is that it doesn’t sync well with our goals—emotion isn’t garnered from individual words. Context, tone, previous words and sentences, and punctuation all dictate how a comment is meant to be perceived. That’s why researchers have been working on new types of data structures to take these factors into account. You can find some interesting datasets to work for text with here.
Speech is often just translated into text and then analyzed, but that wouldn’t be a good fit for emotion recognition. Non-verbal cues dominate how we desire our speech and communication to be perceived, and we want those to be inputted into our model as features. Researchers have been exploring using acoustic features instead of transcription for emotion recognition applications.
Given that both of the major available datasets in Affective Computing are sequences of images and videos, a lot of research on the cutting edge is being done here. Some of the coolest real-time applications of this software will certainly involve the camera on your smartphone. Researchers have been working on understanding how to featurize images and videos, and even getting creative with using data from sites like Flickr and Twitter.
There are certainly interesting challenges to be solved in understanding how to properly engineer features from text, speech, and image/video—but the resurgence of Neural Networks over the past few years has relegated a lot of this conversation to the backlog.
Neural Nets for Emotion Recognition
Source: Semantic Scholar
A Neural Net, a subset of Deep Learning, is a type of algorithm that has become wildly popular over the past couple of years. In addition to its uncanny ability to achieve higher than the formerly state-of-the-art accuracy for many classification tasks, Neural Nets have a critical benefit that’s immensely helpful in emotion recognition: they do feature engineering automatically.
In a Neural Net, we can input the data we want to use (text, speech, etc.) and the data gets passed through different “layers” of the net. Each layer modifies the input values to try and morph it into something useful and predictive in the model. For our purposes, that means that we can input our data as is and tweak the model to output what we need.
Getting even more specific, there are special types of Neutral Nets—called Convolutional Neural Networks (CNNs)—that are very effective for the use of images as inputs. These networks further feature engineer the input images and can help achieve greater accuracy in emotion recognition. One of the cutting-edge algorithms in Affective Computing was developed by two professors from The Open University of Israel and uses CNNs. For an implementation using the Algorithmia platform, check out this tutorial.
Unsupervised Emotion Recognition
While most of the work in Affective Computing has been done using labeled datasets and supervised learning, a few research efforts have centered around a less top-down approach—segmenting the data we have automatically and seeing what kinds of emotions result. These methods often also take context and sentence structure into account to reach tighter classifications.
Some also explicitly try to expand beyond the often confining limits of FACS, like this paper released at a conference in 2012. According to the abstract, “The proposed methodology does not depend on any existing manually crafted affect lexicons such as WordNet-Affect, thereby rendering our model flexible enough to classify sentences beyond Ekman’s model of six basic emotions.” Another approach using the dimensional model is proposed here.
Further Reading on Emotion Recognition
Microsoft’s developer team on emotion detection and recognition using text – “Emotion Detection and Recognition from text is a recent field of research that is closely related to Sentiment Analysis. Sentiment Analysis aims to detect positive, neutral, or negative feelings from text, whereas Emotion Analysis aims to detect and recognize types of feelings through the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise.”
Sylvester Kaczmarek’s survey with a focus on Machine Learning – “In our daily life, we go through different situations and develop feeling about it. Emotion is a strong feeling about human’s situation or relation with others. These feelings and express Emotion is expressed as facial expression. The primary emotion levels are of six types namely; Love, Joy, Anger, Sadness, Fear, and Surprise. Human expresses emotion in different ways including facial expression, speech, gestures/actions and written text. This article mainly focuses on two expressions namely; written text and speech.”
The FACS and Paul Ekman – “The Facial Action Coding System (FACS) is a tool for measuring facial expressions. It is an anatomical system for describing all observable facial movement. It breaks down facial expressions into individual components of muscle movement. It was first published in 1978 by Ekman and Friesen, and has since undergone revision.”
Categorical vs. Dimensional approaches to understanding emotion – “Emotion researchers can be divided into two camps based on their answers to the following question: What is the best way to think about emotions? Some suggest emotions are best thought of as a small number of primary and distinct emotions (anger , joy, anxiety, sadness). Others suggest that emotions are best thought of as broad dimensions of experience (e.g., a dimension ranging from pleasant to unpleasant).”
Whether Categorical and Dimensional approaches can work together – “The results show that the happiness–fear continuum was divided into two clusters based on valence, even when using the dimensional strategy. Moreover, the faces were arrayed in order of the physical changes within each cluster.“
Emotion Detection Papers
Feature Extraction and Selection for Emotion Recognition from EEG – “Advanced feature extraction techniques are found to have advantages over commonly used spectral power bands. Results also suggest preference to locations over parietal and centro-parietal lobes.”
Emotion Recognition from Text Using Semantic Labels and Separable Mixture Models – “This study presents a novel approach to automatic emotion recognition from text. According to the results of the experiments, given the domain corpus, the proposed approach is promising, and easily ported into other domains.”
Emotion Detection and Sentiment Analysis of Images – “If we search for a tag “love” on Flickr, we get a wide variety of images: roses, a mother holding her baby, images with hearts, etc. These images are very different from one another and yet depict the same emotion of “love” in them. In this project, we explore the possibility of using deep learning to predict the emotion depicted by an image. Our results look promising and indicate that neural nets are indeed capable of learning the emotion essayed by an image.“
Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns – “We present a novel method for classifying emotions from static facial images. Our approach leverages on the recent success of Convolutional Neural Networks (CNN) on face recognition problems. Our method was tested on the Emotion Recognition in the Wild Challenge (EmotiW 2015), Static Facial Expression Recognition sub-challenge (SFEW) and shown to provide a substantial, 15.36% improvement over baseline results (40% gain in performance).”
Emotion Recognition Tutorials
- An Emotion Recognition API for Analyzing Facial Expressions
- 20+ Emotion Recognition APIs That Will Leave You Impressed, and Concerned
- Emotion Recognition using Facial Landmarks, Python, DLib and OpenCV
- Introduction to Emotion Recognition for Digital Images
- Emotion Recognition With Python, OpenCV and a Face Dataset