The Methods Man

View Original

Polygenic Risk Scores Might Be Bullshit

A reminder that accuracy in a group does not equal accuracy for an individual.

I was really struggling to think of a good analogy to explain the glaring problem of polygenic risk scores this week. But I think I have it now. Go with me on this.

An alien spaceship parks itself, Independence Day style, above a local office building.

But unlike the aliens that gave such a hard time to Will Smith and Brent Spiner – these are benevolent, technologically superior guys. They shine a mysterious green light down on the building and then announce, maybe via telepathy, that 6% of the people in that building will have a heart attack in the next year.

They move on to the next building. “Five percent will have a heart attack in the next year”. And the next – 7%.  And the next – 2%.

Let’s assume the aliens are entirely accurate. What do you do with this information?

Most of us would suggest you find out who was in the buildings with the higher percentages. You check their cholesterol, get them to exercise more, do some stress tests, and so on.

But that said, you’d still be spending a lot of money on a bunch of people who were NOT going to have heart attacks. So, a crack team of spies – in my mind this is definitely led by a grizzled Ian McShane – infiltrate the alien ship, steal this predictive ray gun, and start pointing it – not at buildings – but at people.

This person has a 10% chance of having a heart attack in the next year. This person has a 50% chance. The aliens, seeing this, leave us one final message before flying into the great beyond: “no, you guys are doing it wrong”.

This week, I’m writing about the people and companies that are using an advanced predictive technology – polygenic risk scores – wrong. And a study that shows just how problematic this is.

We all know that genes play a significant role in our health outcomes. Some diseases, like Huntington’s disease or Cystic Fibrosis, are entirely driven by genetic mutations.

But the vast majority of chronic diseases we face are not solely driven by genetics, though they may be enhanced by genetics. Coronary heart disease is a prime example. There are clearly environmental risk factors, like smoking, that dramatically increase risk. But there are also genetic underpinnings – about half the risk of CHD comes from genetic variation, according to one study.

But in the case of those common diseases, it’s not one gene that leads to increased risk – it’s the aggregate effect of multiple risk genes, each contributing a small amount of risk to the final total.

The promise of polygenic risk scores was based on this fact. Take the genome of an individual, identify all the risk genes, and integrate them into some final number that represents your genetic risk of developing coronary heart disease.

The way you derive a polygenic risk score is you take a big group of people and sequence their genome. Then, you see who develops the disease of interest – in this case heart disease. If the people who develop heart disease are more likely to have a particular mutation, that mutation goes in the risk score.  Risk scores can integrate tens, hundreds, even thousands of individual mutations to create that final score.

There are literally dozens of polygenic risk scores for coronary heart disease. And there are companies that will calculate yours right now for a reasonable fee.

But here’s the thing. The accuracy of these scores is assessed at the population level.

It’s the alien ray gun thing.  Researchers apply the risk score to a big group of people and say, according to this, 20% should develop CHD.  And then, if indeed 20% develop CHD they say the score is accurate. And that’s true.

But what happens next is the problem. Companies and even doctors have been marketing polygenic risk scores to individuals. And honestly it sounds amazing. We’ll use sophisticated techniques to analyze your genetic code, integrate the information to give you your personal risk of CHD.  Or dementia. Or other diseases. There are a lot of people who would want to know this information.

It turns out, though, that this is where the system breaks down. And it is nicely illustrated by this study, appearing in JAMA.

Source: Abramowitz et al. JAMA 2024.

The authors wanted to see how polygenic risk scores, which are developed to predict disease in a group of people, work when applied to an individual.

They identified 48 previously published polygenic risk scores for coronary heart disease. They applied those scores to over 170,000 individuals across multiple genetic databases.

And, by and large, they worked as advertised – at least across the entire group. The weighted accuracy of all 48 were around 78%. They aren’t perfect, of course. We wouldn’t expect them to be, since CHD is not entirely driven by genetics. But 78% accurate isn’t too bad.

AUC (a weighted accuracy metric) across 48 different CHD polygenic risk scores. Source: Abramowitz et al. JAMA 2024. 

But that accuracy is at the population level. At the level of the office building.

At the individual level it was a vastly different story.

And I think it is best illustrated by this plot, which shows the score from 48 different CHD polygenic risk scores within the same single person. A note here – it is arranged by the publication date of the risk score, but these were all assessed on a single blood sample at a single point in time in this study participant.

Source: Abramowitz et al. JAMA 2024.

The individual scores are all over the map. Using one risk score gives him a risk that is near the 99th percentile – a ticking time bomb of coronary heart disease – another score gives him a risk at the very bottom of the spectrum – highly reassuring. And a bunch of scores fall somewhere in between. In other words, as a doctor the risk I will discuss with this patient is more strongly determined by which PRS I happen to choose than by his actual genetic risk, whatever that is.

This may seem counterintuitive. All these risk scores were similarly accurate within a population – how can they all give different results to an individual.  The answer is simpler than you may think. As long as a given score makes one extra good prediction for each extra bad prediction, its accuracy is not changed.

Let’s imagine we have a population of 40 people.

Risk score model 1, correctly classifies 30 of them for 75% accuracy. Great.

Risk score model 2 also correctly classifies 30 of our 40 individuals for 75% accuracy. It’s just a different 30.

Risk score model 3 also correctly classifies 30 of 40, but another different 30.

I’ve colored this to show you all the different overlaps and what you can see is that though each score has similar accuracy, the individual people have a bunch of different colors, indicating that some scores worked for them, and some didn’t. That’s a real problem.

This has not stopped companies from advertising polygenic risk scores for all sorts of diseases. Companies are even using polygenic risk scores to decide which fetuses to implant during IVF therapy – which is a particularly egregiously wrong use of this technology that I have written about before.

How do you fix this? I mean, the aliens tried to warn us. This is not how you are supposed to use this ray gun. You are supposed to use it to identify groups of people at higher risk to direct more resources to that group. That’s really all you can do.

It’s also possible that we need to match the risk score to the individual in a better way. This is likely driven by the fact that risk scores tend to work best in the populations in which they were developed and many of them were developed in people of largely European ancestry.

It is worth noting that if a polygenic risk score had perfect accuracy at the population level, it would also necessarily have perfect accuracy at the individual level. But there aren’t any scores like that. It’s possible that combining various scores may increase the individual accuracy, but that hasn’t been demonstrated yet either.

Look, genetics is and will continue to play a major role in healthcare. At the same time, sequencing entire genomes is a technology that is ripe for hype and thus misuse. Or even abuse. Fundamentally, this JAMA study reminds us that accuracy in a population and accuracy in an individual are not the same. But more deeply, it reminds us that just because a technology is new, or cool, or expensive, doesn’t mean it will work in the clinic.

A version of this commentary first appeared on Medscape.com.