Fraud in Medical Research: Understanding the Carlisle Approach

Fraud in Medical Research: Understanding the Carlisle Approach

A "bombshell" paper appearing in the journal Anesthaesia uses a simple statistical test to suggest that a significant proportion of medical studies - some from our most prestigious journals - may be fabrications. But the story is not a clear cut as it seems. For the video version, click here.

Read More

Propensity Scores: Observational Data and Wishful Thinking

Propensity Scores: Observational Data and Wishful Thinking

Propensity score methods attempt to make observational data look like a randomized trial. But there are some big limitations that need to be considered before we jump on the bandwagon. For the video version, click here.

Read More

Handling Missing Data like a Politician

GTY_donald_trump_hillary_clinton_sk_150619_16x9_992-1.jpg

That certain individuals have referred to the current presidential election drama as reminiscent of a middle school student council election does a great disservice to student council elections. But politics is about power, and knowledge is power (or so I’m told) so perhaps that’s why this current crop of wannabe potentates is so obsessed with polls, surveys, and other data. And that’s the last time I’ll refer to the current political circus during this post. Because all politics is local, and, as a former middle-school class president myself, I have decided to focus this post on a middle-school presidential election.

This article originally appeared on MedPage Today and can be accessed here.

Full disclosure: This election does not exist. It is a figment of my imagination. Any semblance to real individuals, living or dead, is mostly coincidental.

*

The place is Boise, Idaho, and at the Pierre Perito school (a public school of some significant repute) two presidential candidates are duking it out for the votes of the Tweenagers. In one corner, Ronald Frump, a newcomer to this sort of thing who has garnered quite a following with blistering attacks on all his rivals, his promise to build a wall around the playground (to keep those rival West Boise kids out), and general swagger. Opposing him is Hilda Canton, who has been serving in the student council since Kindergarten, diligently working within the system to effect change, and only occasionally using her town-wide notoriety to land her plum speaking gigs at the local candy store.

It has been a heated race. Insults have been flung, jimmies have been rustled, and special interests (particularly the pizza-every-day lobby) have been shamelessly pandered to.

As the big day approaches, polling shows a tight race. While Ronald holds a strong lead in Ms. Jensen’s 5th grade homeroom, Hilda is nearly unimpeachable among women ages 13 to 13 and a half. As usual, the election will come down to turnout. Will the recent pink eye outbreak keep 6th-graders home on Election Day?

Well, as the campaigns ramp up to full-blown blustery capacity, the big day arrives. Voting proceeds from morning til dismissal (as you can only vote during your free period), and exit polls confirm what many suspected: it’s going to be close.

The votes are counted. Recounted. Recounted again. The victor? Hilda, by 241 to 220. Ah, the power of the people.

But there’s a problem. Every student who voted signed in. And there are 490 signatures. Pierre Perito Middle School is missing 29 votes.

*

You can imagine the scene. Shouts of fraud, corruption, conspiracy! The Frump team disavows the entire election, stating a totally new election must be held. The Canton team, of course, states that the results are acceptable… we can’t expect everything to add up perfectly. And besides, Frump would need to have 22 of those 29 votes be for him if he was going to win this election.  What are the chances?! If we redo the whole election every time we miss a few votes, well, the whole system will collapse. In the end, the people have spoken.  Haven’t they?

Before we delve into the statistics behind this, I want to ask you to think for a second about what you would do in this situation. Do you redo the election? How would you decide? What information would help you figure it out?

Well it turns out, that missingness is not so simple an idea. In fact, there are kinds of missingness.  Varieties of missingness. And missingness, and the way missingness is handled, can often tip the scales in biomedical research studies.

Missing Completely at Random (MCAR)

Missing completely at random (MCAR) means that the data that is missing is missing for purely random reasons, which is to say, no reason at all. Put more formally, the missingness is unrelated to any other covariate, measured or not. This is the type of missingness that most (but not all) statistical tests assume, the most difficult to verify, and the one, in my opinion, least relevant to biomedical research.

To take our middle school, an example of MCAR would be if someone took all the votes, shuffled them all up, and accidentally dropped 29 on the ground when transferring to the counting place.  Those votes that were dropped, the missing votes, are missing completely at random. There was no choice of what vote to drop, no group got selected to be dropped on the floor.

As you can imagine, under the MCAR assumption, Frump’s path to victory is an uphill climb.  Let’s do the math:

Looking at the votes we actually counted, 52.3% of the electorate voted for Hilda. Because the dropped votes were dropped at random, chances are about 52.3% of those would have been Hilda votes. Now, in reality, the chance nature of how votes were dropped could skew that a bit, but is it possible that it could have been skewed enough to give Frump the win?  Statistics to the rescue!

Frump needs 22 of the 29 votes, around 76%. The chances of getting 76% for Frump when the underlying base probability for Frump is 47.7% is given to us by the Binomial distribution.  Long story short, the chances are around 8 in 1000.  Is it possible that the dropped votes swung the election? Sure. But it’s not very likely.

This is, frequently, how clinical trials treat their patients who are lost to follow-up.  They, essentially, assume that their outcome would be the same as whatever the outcome is in people who weren’t lost to follow-up. That their missing data is, essentially, missing completely at random. I suggest to you that that assumption is not too valid.

This came to a head at an FDA panel recently where the agency was evaluating apixaban (Eliquis) a novel oral anticoagulant. The clinical trial under review had shown not only that the new drug carried with it a significantly lower rate of stroke than warfarin, but an improved overall mortality (p=0.047). But the margin for that mortality claim was quite thin – in fact, if one extra patient in the apixaban group died and one extra patient in the warfarin group survived, the difference would no longer be statistically significant.  Now, ordinarily, I’d say who cares? We picked a p-value of 0.05 to be significant, we have to abide by it.  But FDA reviewer Thomas Marciniak noted in his comments that data problems "destroy our confidence" that the drug reduces death.  Ahh – so now we have to believe that the death rates among those missing people were exactly the same as among non-missing people. I agree, that’s a stretch.  (The FDA, on the other hand, didn’t).

Missing completely at random is a very high bar to reach.  One notch lower is MCAR's little brother, missing at random.

Missing at Random

This is… an unfortunate term. But it’s how this is always described, so you should know it has a technical meaning. Missing at random differs from MCAR in that the missingness is related to a measured covariate.

Back to our election. Let’s say that, because colors help children learn (citation needed), ballots were printed on pink or blue paper, and (because this is Idaho, I guess) boys get blue ballots and girls get pink.  Now assume that some overzealous vice-principal trashes 29 BLUE ballots.

The ballots he chose to trash are random (at least with respect to the vote on them) but their missingness is entirely tied to color. Now what are Frump’s chances.

As you can probably guess, it has something to do with how boys vote. If boys vote overwhelmingly for Frump, his chances that these discarded chits will swing the election might be substantial. For example if 75% of boys voted for Frump, the chance that these missing ballots would trun the election is 56%. With a risk that high, you might be forced to do the whole election again. Conversely, if sex has nothing to do with who you vote for, well, then we’re back to the MCAR calculation and we can let the results stand.

Missing not at Random

This is the worst type of missingness.  It's the kind where the missingness can not be accounted for by a variable you have on hand. This would be a situation where someone deliberately removed 29 Frump votes from the pile. This can not be proved statistically (unless you find the votes), so it can be hard to prove.

In biomedical research, this may be the most common cause of missingness. We can almost never explain completely why certain individuals were lost to follow-up (so missing at random is out). We are left, then, debating whether they dropped out due to completely random reasons (lightning strikes, alien abductions, etc), or that they dropped out for reasons that may matter. In the latter, more realistic case, there is no statistical test that can fix the results.

So we are left either closing our eyes and pretending the data is MCAR, or (and I prefer this route) doing two sensitivity analyses where you assume that everyone who you lost track of experienced the outcome or that everyone you lost track of didn’t experience the outcome. If you get the same results either way, we can be pretty confident that the study conclusions are reliable.

Unfortunately, I rarely see these analyses performed. When I do, it pretty much always shows that the conclusions are robust to those sensitivity analyses. Should we be suspicious of that? I shall not judge.

So the next time you’re reading that article, play this game.  The authors get 1 point if they even mention missing data. They get 5 points if they try to analyze why data might be missing. They get 50 points if they do the aforementioned sensitivity analyses. And if those analyses lead them to conclude that their primary results may not be valid, they get 1 million points, and the official Methods Man "You're a researcher with integrity" prize.

In the meantime, get out and vote.

Standard deviation, standard error, what's the difference?

ErrorMessage.jpg

Error. If you're an epidemiologist, you work, live, and breathe it. Some of us loathe it as the source of all our negative studies. Some of us embrace it as a reminder that the universe is, well, just imperfect. And maybe that's OK. But however you feel about error, you best be reporting it in your research studies. And how you report it matters. For a full discussion, check out my latest methods man blog post.