Algorithmia Blog - Deploying AI at scale

Vertical Spotlight: Machine Learning for Healthcare Diagnostics

Source: Case Engineering

Diagnostics is part of the core of healthcare — research suggests a third of all Healthcare AI SaaS companies are tackling just this sector.

Machine Learning can automate parts of the diagnostic stack, aid doctors in deciding how to interpret tests, and greatly reduce errors in communication. This post will walk through popular use cases, the challenges inherent in applying ML models in diagnostics, and some of the tradeoffs to be made in model selection.

Unique Use Cases

Diagnostic errors account for about 10% of yearly patient deaths, mostly due to issues like poor tracking, misinformation, and miscommunication. Machine Learning can help both save practitioners valuable time in automation as well as improve accuracy and outcomes.

Chatbots and Conversation

Much of the value in diagnosis comes from the natural conversation between doctor and patient, and a doctor’s ability to suss out the relevant symptoms from the patient is key. Machine Learning powered chatbots can do a good chunk of that work without the need for an actual doctor: asking targeted questions and recommending the appropriate courses of action (visit a pharmacy or a doctor).


More and more studies are showing the benefits of early detection in cancer. Stanford recently developed an algorithm that does as well as doctors in identifying skin cancer and a similar idea for lung cancer is deployed in 35 hospitals in China by 12 Sigma.


Analyzing bodily fluids and tissue is part of the core of diagnostics, and algorithms can help beyond just oncology. Advances in digital pathology are providing more and more quality input images as training data from under a microscope.

Scanning and MRI

Researchers who applied Deep Learning to MRI images at early ages (6 to 12 months) were able to reliably predict and diagnose Autism. Neural nets have also been used for MRI image segmentation to delineate the boundaries of certain tissue.


There’s still a bunch of red tape to get past in medical diagnostics, but Oscar Health was able to innovate and successfully launch Machine Learning powered tooling in mid-2017. One of the problems that doctors face in analyzing patient data is the sheer volume of it: it’s difficult to understand what’s relevant to look at, especially in context of specific symptoms. Oscar’s product is called the Clinical Dashboard, and it’s targeted at helping doctors solve the this problem.

According to the company, “a team of Oscar technologists created algorithms that parse through claims data, lab panels, and other relevant data feeds to generate alerts around likely health conditions, abnormal test results, and red flags.” That kind of tech is possible when you own your entire stack (as Oscar does) but the challenges of this domain make it much more difficult for other players.

Domain Challenges

The healthcare system is unique in a number of ways, but one of those is its technology infrastructure: it’s generally old, inaccurate, and difficult to work with. That fact, along with a few other considerations, explain why Machine Learning has struggled to make it into the mainstream in this arena.

The highest of stakes

Applying Machine Learning to fraud or content recommendations has a pretty low downside: worst case scenario you lose money or hurt your public image. Not so with our use cases: medicine is often life or death, and that’s a heavy responsibility for models and their creators to bear. Data Scientists might tend to lean towards being more careful than usual, and set high model thresholds for classification.

Aging, unsophisticated infrastructure

Most Machine Learning practitioners will tell you that the majority of their work is in preparation: data quality, rigidity, cleanliness, and flexibility. Much of the medical industry uses outdated, poorly designed software that makes it difficult to properly capture and deliver the needed data for ML applications.

Poor data quality and quantity

Since much of diagnosis happens in real time, it’s not realistic for doctors to be entering all the information they get from patients into a digital format (at least at the time). Much of the medical data that Data Scientists have to work with is handwritten in notes, and far from comprehensive. Some doctors write down more than others, and some use terminology and language that’s foreign to those doing the modeling.

Natural subjectivity

Ironically enough, diagnosis can be more of an art than a science in some ways. Given the same symptoms or the same MRI results, two doctors might have different diagnoses based on the same data. Additionally, more and more research is showing the effect that cognitive biases can have on medical decision making. Dealing with this subjectivity is a difficult task for Data Scientists.

Model Tradeoffs

Most of the tradeoffs that Data Scientists approaching healthcare need to look at are informed by the domain specific challenges: namely data quality, delivery, and the high stakes of predictions. Medical image data will likely need heavy preprocessing (especially MRI data) to get it ready for ML models. Other types of data can be sparse and inaccurate, and models need to take that into account.

Another key area that Data Scientists in healthcare need to think about is explainability. Like in other domains where the decisions of models impact end users directly, it’s difficult to say “we think you have cancer because the computer told us so.” More complex models like neural nets will usually be more accurate, but their decisions are effectively unexplainable for now. Data Scientists need to focus on how their models can create coherent narratives that doctors can relay to patients.


7 applications of Machine Learning in pharma and medicine (TechEmergence) – “When it comes to effectiveness of machine learning, more data almost always yields better results—and the healthcare sector is sitting on a data goldmine. McKinsey estimates that big data and machine learning in pharma and medicine could generate a value of up to $100B annually, based on better decision-making, optimized innovation, improved efficiency of research/clinical trials, and new tool creation for physicians, consumers, insurers, and regulators.

Deep Learning and medical diagnosis (Towards Data Science) – “Over the last few months, there have been a number of announcements of research findings that claim that deep learning has been applied to, and often times immediately outperforms doctors in, a particular area of diagnosis. I originally started this blog post to keep track of them — I’m going to publish it as a draft that I expect to update on a regular basis.

Machine learning for medical diagnosis: history, state of the art and perspective (Science Direct) – “The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state-of-the-art view, and a view on some future trends in this subfield of applied artificial intelligence.”

Don’t just scan this: Deep Learning techniques for MRI (Nicholas Bien) – ”Magnetic resonance imaging (MRI) is an advanced imaging technique that is used to observe a variety of diseases and parts of the body. MRI’s unrivaled soft-tissue contrast makes it useful for detecting abnormal tissue known as “tumors” or “lesions”. As we will see later, neural networks can analyze these images individually (as a radiologist would), or combine them into a single 3D volume to make predictions.