Artificial Intelligence Outperforms Pathologists in Diagnosing Metastatic Breast Cancer

An artificial intelligence algorithm has outperformed expert pathologists in diagnosing metastatic breast cancer in a study that may completely disrupt medical imaging.

Increasing levels of automation in every industry threaten jobs, but so far physicians have felt relatively comfortable. We’ll never be replaced by algorithms, right? But this study, appearing in the Journal of the American Medical Association should get pathologists, at least, a bit worried.

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

It is the best demonstration to date of how machine-learning is going to transform medical imaging.

The images we're talking about here are sentinel lymph node slides.

Take a look – in that little green box is a tiny area of metastatic breast cancer. Pathologists miss these from time to time. After all, they are only human.

Whole-slide image of sentinel lymph node biopsy with area of metastatic breast cancer highlighted in green square.

Whole-slide image of sentinel lymph node biopsy with area of metastatic breast cancer highlighted in green square.

For the first time, computers have done better. 

Researchers sponsored a worldwide competition to develop an algorithm that would identify breast cancer cells on scanned lymph node slides.

Hand-labeled focus of metastatic breast cancer in lymph node slide.

Hand-labeled focus of metastatic breast cancer in lymph node slide.

Teams that signed up were sent 270 slides, 110 with nodal mets, and 160 without that had been painstakingly hand-labeled to show the computers where the diseased cells were.

After learning from that data, the algorithms were then unleashed on 129 brand new unlabeled slides. The winner was the algorithm that got the most slides right.

But let's start with the humans. 11 trained pathologists were given 2 hours to look at the 129 test slides – a workflow that is pretty standard I am told. Of the 49 test slides with metastatic disease, the pathologists found 31 on average. That’s an important false negative rate. One pathologist was allowed to work without time constraints, unrealistic as that is, he or she correctly identified 46 out of 49 slides with cancer and 79 out of 80 without.

AI-driven highlighting of areas highly suspicious for metastatic cancer.

AI-driven highlighting of areas highly suspicious for metastatic cancer.

32 machine-learning algorithms competed; the best came from a Harvard-MIT collaboration.  The performance of this algorithm on the test images was nearly perfect, identifying cancer and non-cancerous slides with almost 100% accuracy, and highlighting the areas of concern like this.

This is pretty impressive, but there's something really special about this study which has me excited.  In most of these image classification tasks, the gold-standard is human perception. Some human expert, or group of them, look at a slide or x-ray or retina image or something and say "yes, this is pulmonary edema". I am always left wondering like – well, ok, but how can we ever beat humans if humans are the gold-standard?

In this study, the gold standard was immunohistochemical staining – staining neither the human pathologists NOR the machine algorithms had access to.

Left: Hemotoxylin and eosin stain shows area of potential metastatic disease. Right: Immunohistochemical stain gives "gold standard" evidence of metastatic disease. Pathologists and AI agents only had access to H&E images.

Left: Hemotoxylin and eosin stain shows area of potential metastatic disease. Right: Immunohistochemical stain gives "gold standard" evidence of metastatic disease. Pathologists and AI agents only had access to H&E images.

In other words, these algorithms were better than humans when held to a completely objective gold standard.  That's pretty amazing.

Now pathologists shouldn't be hunting for new jobs quite yet. This was a small study, using slides from only two centers. I wish the researchers had thrown a third center into the test set – would different staining practices have thrown off the computer algorithms perhaps?  Also the pathologists mostly missed micrometastases - areas of less than 2mm. With modern breast cancer therapy, it’s not clear that missing such small areas would actually have a significant clinical impact.

With all the hype surrounding machine learning, it's easy to think it's just a fad. It's not. Mark my words, studies like this will redefine medical imaging in the near future.  And if you don't believe me, ask your local area network.