Misinterpreting this Depression Study May Lead Doctors to Treat the Wrong People

January 8, 2020 F. Perry Wilson

The study claims to predict response to escitalopram treatment - but that is a potentially dangerous interpretation.

It’s a new year, and after a little holiday break I’m back and, frankly, a bit cranky as I peruse the recently-published medical literature, so I’m focusing today on a rather small study, but one that hits a pet peeve of mine and so I’m going to channel my inner Andy Rooney here and gripe for a bit.

Appearing in JAMA Network Open we have this article with the compelling title “Use of Machine Learning for Predicting Escitalopram Treatment Outcome From EEG Recordings in Adults with Depression”.

I like to know what I’m getting into when I read a title. And this title promises quite a bit. To me, it reads like researchers used an EEG and some fancy machine-learning stuff to predict which patients with depression would benefit from escitalopram treatment.

That idea, using a machine learning model to choose the best psychiatric treatment is holy grail-level personalized medicine stuff. See, when confronted with major depressive disorder, docs often try medication after medication to see what sticks – anything to lessen that trial-and-error approach would save tons of time, not to mention lives.

But that is not what this study is about. Walk with me through the methods and you’ll see what I mean.

Researchers from British Columbia analyzed EEG data from 122 adult patients with major depression who were initiated on escitalopram therapy.

As you know, an EEG outputs a ton of data – multiple electrodes, thousands of measurements. This is actually an ideal place to use machine learning tools to squeeze all that data into a single number and the authors do an exemplary job of using a well-established machine learning algorithm called a support vector machine to take those gobs of data and turn it into a prediction.

But what exactly are they predicting?

They are predicting whether the patient will have remission of depression in 8 weeks. They are NOT predicting whether escitalopram was good for the patient, and that difference is huge.

This study had no control group. All 122 patients were treated with escitalopram. We therefore have no way to know if the machine learning model identified individuals more likely to achieve remission regardless of therapy (let’s remember that depression spontaneously remits in around 20% of cases) or those who truly benefit from escitalopram.

The “destinies” of a patient with depression with regards to escitalopram

See, every patient with depression has four potential destinies with regards to escitalopram.

Some will have remission with or without the drug. Some will never have remission regardless of treatment. Some will ONLY experience remission if they get the drug, and others, presumably would only NOT experience remission if they get the drug.

It’s really the last two categories we care about in terms of deciding on treatment, but ironically the first two categories are the easiest to predict – because in the end the biggest predictor of whether you get remission from depression is NOT whether you get a drug, but how severe your depression is in the first place.

This is a huge difference in terms of a prediction problem and one that can actually lead to patient harm.

Let me give an example.

Imagine we built a model predicting who is least likely to have a heart attack among a population receiving simvastatin.

Without a comparator group, we’d find that individuals with lower LDL, more physical activity, and without diabetes would have the best outcomes. If we then argue that these are the types of people who should receive statins we’d be doing a huge disservice to the people with more severe disease at baseline. Our model doesn’t tell us who should get the drug, it only tells us who was better off in the first place.

We need models that can target therapies to the right patients, regardless of how sick they are at baseline, or else we’ll always choose the least sick to get treatment. Sure, that will make the success rate of therapies look awesome, but it’s not how I want to practice medicine.

Ok back to escitalopram. What this paper shows us is that the authors built a model, based on EEG data that shows who is likely to have remission of depression. You could in fact argue that the model has nothing to do with escitalopram. The model may predict outcomes equally well among patients on any anti-depressant, or even no anti-depressant at all. In other words, we’re no closer to the dream of strapping an EEG on someone’s head and knowing what drug to give them than we were before. But studies like this get reported inaccurately ALL THE TIME, suggesting that we have some new tool in our personalized medicine toolbox.

My biggest fear is that these models get commercialized as some sort of “use this to decide who to treat” black box, which, as we now all understand, is biased against those who are sicker at baseline, even if they would respond well to therapy. The second sentence of the conclusion of this paper reads:

Oy.

“Developed into a proper clinical application, such a pipeline may provide a valuable treatment planning tool”. Not really – not unless you want to reserve treatment for the least sick individuals.

Could the researchers prove that their model is not simply identifying less severe depression as opposed to escitalopram-response? Well, they could show how their model correlates with baseline depression scores or other baseline factors – my bet is that we’d find that mostly the model just identifies those with less severe depression at baseline – but that data is not presented.

And let’s remember, that although it’s very cool to get data about how severe your depression is just from an EEG – I mean that’s star trek-y and I love it – we have plenty of tools already available to assess depression severity.

So the next time we see a study, using machine learning or otherwise, that claims to “predict response to therapy” – the very next question we have to ask is “how do we know the model isn’t simply identifying less severe disease at baseline”?

Happy new year.

This study first appeared on medscape.com.