Evidence-based medicine lacks solid supporting evidence

Assumptions for generalizing clinical trial data to particular patients rarely withstand scrutiny

Hippocrates engraving

The ancient Greek physician Hippocrates advocated that doctors used experience-based intuition but also emphasized the importance of medical theory about the body’s fluids.

Wellcome Library, London (CC BY 4.0)

For millennia, medicine was more art than science.

From at least the time of Hippocrates in ancient Greece, physicians were taught to use their intuition, based on their experience.

“For it is by the same symptoms in all cases that you will know the diseases,” he wrote. “He who would make accurate forecasts as to those who will recover, and those who will die … must understand all the symptoms thoroughly.”

In other words, doctors drew general conclusions from experience to forecast the course of disease in particular patients.

But Hippocratic medicine also incorporated “scientific” theory — the idea that four “humors” (blood, black bile, yellow bile and phlegm) controlled the body’s health. Excess or deficiency of any of the humors made you sick, so treating patients consisted of trying to put the humors back in balance. Bloodletting, used for centuries to treat everything from fevers to seizures, serves as an example of theory-based medicine in action.

Nowadays medical practice is supposedly (more) scientific. But actually, medical theory seems to have taken a backseat to the lessons-from-experience approach. Today’s catch phrase is “evidence-based medicine,” and that “evidence” typically takes the form of results from clinical trials, in which potential treatments are tested on large groups of people. It’s basically just a more systematic approach to Hippocrates’ advice that doctors base diagnosis, treatments and prognosis on experience with previous patients. But instead of doctors applying their own personal clinical experience, they rely on generalizing the results of large trials to their particular patients.

You should call this approach the “Risk Generalization-Particularization” model of medical prediction, Jonathan Fuller and Luis Flores write in a paper to be published in Studies in History and Philosophy of Biological and Biomedical Sciences. (It’s OK to call it ‘Risk GP’ for short, they say.) “Risk GP” they note, is “the model that many practitioners implicitly rely upon when making evidence-based decisions.”

Risk GP as a model for making medical judgments is the outgrowth of demands for evidence-based medicine, write Fuller, on the medicine faculty at the University of Toronto in Canada, and Flores, a philosopher at King’s College London in England. It “advocates applying the results of population studies over mechanistic reasoning … in diagnosis, prognosis and therapy.” Evidence-based medicine has set a new standard for clinical reasoning, Fuller and Flores declare; it “has become dominant in medical research and education, accepted by leading medical schools and all of the major medical journals.”

So it seems like a good idea to ask whether the “evidence” actually justifies this evidence-based approach. In fact, it doesn’t.

“There are serious problems with the Risk GP Model, especially with its assumptions, which are often difficult to warrant with evidence and will often fail in practice,” Fuller and Flores assert.

In their paper, they outline serious problems with both the generalization and particularization sides of the GP coin. If you treat a patient on the basis of data from clinical trials, you ought to be sure that the patient actually is a member of the population sampled to perform the trial. You’d also like to be sure that the sample of patients in the trial actually did fairly represent the whole population it was sampled from. In practice, these requirements are never fully met. Physicians simply assume that the study populations are “sufficiently similar” to the people being treated (the “target population”). But, as Fuller and Flores point out, that assumption is rarely questioned, and evidence supporting it is lacking.

“Our target might be a population that was ineligible for the trial, such as older patients or patients with other concurrent diseases,” they write. In fact, patients with other diseases or people taking multiple medications — those typically not allowed in trials — are often exactly the people seeking treatment.

“Given the demographics of patients in hospital and community practice, target populations often include these very patients that trials exclude,” Fuller and Flores write.

Generalizing from a trial’s results to the target population is therefore risky. But even if that generalization is fair, applying trial results to treating a particular patient may still be unjustified. A patient may belong to a defined target population but still not respond to a treatment the way the “average” patient in a trial did.

After all, trials report aggregate outcomes. Suppose a drug reduces the incidence of fatal heart attacks in a trial population by 20 percent. In other words, say, 100 people died in the group not getting the drug, while only 80 people died in the group receiving it. But that doesn’t mean the drug will reduce the risk to any given individual by 20 percent. Genetic differences among the patients in a trial may have determined who survived because of the drug. For a patient without the favorable gene, the drug might actually make things worse. No one knows.

Fuller and Flores go into considerable quantitative detail to illustrate the flaws in the Risk GP approach to medical practice. Their underlying point is not that the Risk GP method is always wrong or never useful. It’s just that the assumptions that justify its use are seldom explicitly recognized and are rarely tested.

It’s worth asking, of course, whether such philosophical objections have truly serious implications in the real world. Maybe clinical research, while not resting on a rigorously logical foundation, is generally good enough for most practical purposes. Sadly, a survey of the research relevant to this issue suggests otherwise.

“Evidence from clinical studies … often fails to predict the clinical utility of drugs,” health researchers Huseyin Naci and John Ioannidis write in the current issue of Annual Review of Pharmacology and Toxicology.

Questionable evidence

In their review, Naci and Ioannidis find all sorts of turbulence in the medical evidence stream. Often clinical studies fall short of the rigorous methodology demanded by “gold standard” clinical trials, in which patients are assigned to treatment groups at random and nobody knows which group is which. And even randomized studies have “important limitations,” the researchers write.

Apart from methodological weaknesses, clinical research also suffers from biases stemming from regulatory and commercial considerations. Drug companies typically test new products against placebos rather than head-to-head against other drugs, so doctors don’t get good evidence about which drug is the better choice. And restrictions on what patients are admitted to trials (as Fuller and Flores noted) make the test groups very unlike the people doctors actually treat. “As a result, drugs are approved on the basis of studies of very narrow clinical populations but are subsequently used much more broadly in clinical practice,” Naci and Ioannidis point out.

Their assessment documents all sorts of other problems. Tests of drug effects often rely on short-term surrogate indicators (say, change in cholesterol level), for instance, rather than eventual meaningful outcomes (say, heart attacks). Surrogates often overstate a drug’s effectiveness and imply beneficial effects twice as often as studies recording actual clinical outcomes.

Medical evidence is also skewed by secrecy. Studies with good news about a drug are more likely to be published than bad news results, and bad news that does get reported may be delayed for years. Evidence synthesis, as in meta-analyses of multiple studies, therefore seldom really paints the whole picture. Studies suggest that meta-analyses exaggerate treatment effects, Naci and Ioannidis report.

All these factors, many the offspring of regulatory requirements and profit-making pressure, render the evidence base for modern medicine of questionable value. “Driven largely by commercial interests, many clinical studies generate more noise than meaningful evidence,” Naci and Ioannidis conclude.

Thus Fuller and Flores’ concern about the Risk GP approach’s assumptions is joined by real-world pressures that exacerbate evidence-based medicine’s shortcomings, suggesting the need for different, or at least other, approaches. In some situations, a theory-based approach might work better.

Nobody advocates a return to the four humors and wholesale bloodletting. But Fuller and Flores do argue for more flexibility in choosing a basis for predicting treatment outcomes. A mechanistic understanding of how diseases operate on the biochemical level offers one alternative.

This approach is not exactly absent from medicine today. Much progress has been made over the last century or so in identifying the biochemical basis for many sorts of medical maladies. But apart from a few specific cases (such as some genetic links relevant to choosing breast cancer treatments) its advantages have not been widely realized.

Ideally, mechanistic-based methods based on medical theory and biochemical knowledge would improve decisions based solely on generalization and particularization. And, as Fuller and Flores note, other models for making predictions also exist, such as a doctor’s personal experience with patients or even one individual patient.

Many doctors do apply different approaches on a case-by-case basis. No doubt some particular doctors have superb intuition for when to rely on clinical trials and when to go with their gut. But modern medical philosophy seems to have strayed from the Hippocratic ideal of combining theory with experience.  

Medicine today would be better off if it became standard practice to assess multiple models for making predictions and “match the model to the circumstances,” Fuller and Flores say. “When it comes to medical prediction, many models are surely better than one.”

Follow me on Twitter: @tom_siegfried

Tom Siegfried is a contributing correspondent. He was editor in chief of Science News from 2007 to 2012 and managing editor from 2014 to 2017.

More Stories from Science News on Math