|
The End of
Biomedical Journals
There is Madness in Their
Methods
Mikel Aickin
University of Arizona
03sep05
In The Structure of Scientific Revolutions, Thomas Kuhn
portrayed normal science as slipping into a moribund condition, in
which it could no longer provide acceptable answers to its own
questions. Then some kind of shift would come along to replace the
dominant outlook with an improvement, which was in turn destined to
become the new version of normal science, perpetuating the whole
cycle. In Kuhn’s version, it was the discovery of new puzzles,
consisting of observations that could not be satisfactorily
explained in the current paradigm, that led to an apparent shift.
He did not, however, consider a situation in which the methods of
normal science might simply degenerate, producing the same kind of
crisis, and possibly the same kind of resolution.
There is unsettling evidence that we are now in the midst of a
methodologic degeneration in biomedical science. This appears to be
occurring in, of all places, our fundamental approach to inference
–using observation and evidence to decide how to act or believe.
That it might be happening in medical research makes it of more
than just academic interest.
One of the few benefits in a degeneration of conventional
methods is that the normal scientists are unlikely to recognize
that it is happening, and so the process will be not only made
public, but actually touted as excellent science. So it is with a
remarkable article published on 27aug05 in the Lancet regarding
homeopathy.(1) The editors of Lancet are evidently proud of
their publication, since they use it as the basis for a call to end
homeopathy. Does this article justify the editorial, or is it in
fact a betrayal of the very principles that the Lancet claims to
stand for? Let us see how this specific article fares in the light
of the conventional criteria that are applied to articles in
clinical trials and biomedical science generally.
Treatment Comparisons.
A clinical trial has investigated two therapies for a given
condition. On a scale in which larger numbers are better, and zero
stands for no effect, the effect of therapy A is 0.54 (SDE=0.196),
while for therapy B it is .13 (SDE=0.154). The researchers conclude
that treatment A is effective (p=0.006) while therapy B is not
(p=0.40).
The article is, of course, not accepted for publication. The
reason is that the whole point of having two groups in a study is
to compare them with each other. The difference between the
treatment effects in the two groups is 0.41 (SDE=0.249) with
p=0.10. By the conventional criterion for making such comparisons,
this result is not “statistically significant”. It means that there
is no basis for saying that the two therapies have different
effects. The study is null.
The data come from the abstract of the Lancet article. 0.54 is
the negative log of the odds ratio (0.58) from conventional
studies, and 0.13 is the same transformation of the odds ratio
(0.88) in homeopathic studies. In the abstract, and in the comments
elsewhere in the issue, the faulty analysis is treated if it were
correct: therapy A (conventional medicine) is indeed effective,
while therapy B (homeopathy) is not.
Differential Compliance.
Another study has randomized 110 patients each to two therapy
groups. The therapies are hard to maintain, and so only 21 patients
comply in one group, while an even more disappointing 9 comply in
the other group. The difference is “statistically significant” with
p=0.018. The authors are surprised when their article is rejected,
on the grounds that such a low rate of compliance, combined with a
differential between the two groups, casts the results in serious
doubt. The study has failed.
The numbers are from the abstract of the Lancet article. There
were 21 “high-quality” homeopathic studies, and 9 “high-quality”
conventional studies. The conclusion is clear; there has been a
“statistically significant” demonstration that homeopathy articles
are of higher quality than comparable conventional medical articles
on the same topics. Unfortunately, this invalidates the rest of the
paper. (As a footnote, it was only recently that the supposed poor
quality of CAM research was being cited as the reason for a false
excess of positive CAM studies. Now that the quality results are in
the opposite direction, this argument is evidently no longer
valid.)
Intent-to-treat.
Yet another study also enrolls 110 pair-matched patients in each
of two groups. One group has 8 evaluables while the other has only
6. The article is rejected on the grounds that once patients are
entered into the study, they must be analyzed in their original
group. This means, among other things, that if they did not
contribute endpoint data, then some imputation scheme must be used.
The results as presented are faulty not only because more than
ninety percent of the data are missing, but because there is no
guarantee that the patients actually analyzed are matched (that is,
the pair matching was destroyed by the missing data, a point passed
over by the authors). The process of selection that produced
“evaluables” is not above question.
The data come from the abstract of the Lancet article. The odds
ratios cited above are based on 8 homeopathic and 6 conventional
articles (not 110 of each, as implied elsewhere in the article and
in the Lancet editorial). The loss of pairing was ignored, of
course. The validity of the measures used to include articles is
not adequately justified, despite the fact that the results might
well be almost entirely driven by them.
Post-study power computations.
A study without a control group reports an apparent treatment
effect of 0.13 (SDE=0.154). This is properly reported as not
“statistically significant”. The article is only accepted subject
to revision, since a negative study with a small sample size should
provide a power computation (this is not, as often and erroneously
thought, to justify the study in the first place, but to determine
whether the results are worth anything at all). A conventional
calculation gives a detectable effect of 0.462 (power 85%). The
editors decide that this is too large to be reasonable, and reject
the article.
The data come from the abstract of the Lancet article. A
negative result is reported (homeopathy is no better than placebo)
with a miniscule sample size, and no power calculation.
Control of confounding.
A group of epidemiologists conduct an observational study of
seven risk factors on a disease outcome. The issue is to determine
whether the risk factors are the same in two groups of people. The
data presentation consists of a series of univariate odds ratios,
one for each risk factor, with p-values to test a null association.
The article is rejected for two reasons. First, since the purpose
was to compare risk factors across the two groups, the comparisons
with null effects are not germane, and the obvious comparisons
between the groups should be made. But more importantly, there is a
known confounder that should have been controlled in the analysis
(that is, there should have beena multivariate analysis), and
moreover the risk factors that were analyzed are intercorrelated,
so that again multivariate analyses should have been carried
out.
The data are from Table 3 of the Lancet article. One could take
quality as the confounder, or perhaps one of the other factors.
There is, of course, no reason to dichotomize quality, and since
this generally results in misclassification bias, there is reason
not to. Obviously the analysis does not compare conventional
medicine with homeopathy, but rather compares each to the null. An
appropriate analysis would not only make the comparison between
therapy groups, but would take the pairing into account.
Meta-analyses.
There are, therefore, five areas in which the Lancet article
does not meet the minimum, conventional criteria for publication in
biomedicine. This is, however, not the most serious problem with
the article. For this, we need to go back to recall the original
aim of a meta-analysis, or overview. It is to assemble all of the
obtainable, relevant literature on studies done for the purpose of
comparing different therapies for a given condition. The original
reasons for developing the concept were to collect scattered
literature into one place, to apply uniform criteria for study
selection and analysis, and to come to a conclusion about the best
therapeutic approach, or to say that the evidence was not yet
conclusive. Somehow this precise and useful form has degraded into
an unrecognizable hash, in which any papers on any topic can be
bundled together in an investigation of questions of unlimited
ambiguity. A classic paper along this path has already been
published in the Annals of Internal Medicine.(2) Here the
authors studied a single therapy (vitamin E supplementation),
breaking the first rule of meta-analysis, for multiple conditions
(breaking the second rule) in studies not designed to test the
therapy (breaking the third rule). There is evidence that they were
not sufficiently careful about the form of the various vitamin E
treatments, violating a fourth rule.(3) This study in effect
concocted perhaps the most biased sample of human beings one could
find in the biomedical literature, and then made the truly bizarre
assertion that its results applied to everyone. One can only
presume that the lack of a negative reaction to the Annalss article
paved the way for the Lancet article.
There are other examples, of a similar order of strangeness, but
I will only mention the therapeutic touch article published in
Journal of the American Medical Association.(4) This article
was on research carried out by a nine-year old girl, under the
direction of her mother, an ardent opponent of therapeutic touch.
The methodology was debunked in an article in Alternative
Therapies(5), showing that it contained appalling,
irremediable flaws. After the original article was published, the
JAMA editor was criticized for poor judgment, by accepting a
low-quality article to make a political statement. This could be
seen as an unjustifiably beneficent interpretation, however,
because no one seems to have noticed the very real possibility that
the article, poor though it was, actually did meet JAMA’sscientific
standards.
Malpractice.
If you make a few simple assumptions, you can roughly compute
the number of possible instances of malpractice that a physician
might commit in a lifetime of practice. It is not a particularly
large number. Now consider a journal that publishes papers which
mislead health professionals and ordinary people about the
effectiveness of medical practices. Surely one article, no less one
researcher, can have a harmful effect through research malpractice
that dwarfs the meager capacity of a single physician. The
malpractice risk of a typical journal must be even larger.
If we are to see a continued degradation of methods in
biomedical research, supported by “leading” journals, then perhaps
it is time to think about the End of Biomedical Journals, as we
know them. In the US at least, it would not seem unthinkable that
the National Institutes of Health, through the National Library of
Medicine, could undertake a web-based project to publish all
funded, and much of the unfunded research, in all areas of
biomedicine. The need for the current unregulated system would
vanish. Editors and referees would continue to be needed, but they
would operate under rational regulations, and would not be in a
position to endanger the public health on the basis of personal
whims. Incidently, the job of meta-analyses would be infinitely
easier, since the hunt for relevant articles would be all but
accomplished. And, as we know because of PubMed, the technology is
available, and the NLM knows how to apply it.
To return to Thomas Kuhn, we certainly have many biomedical
puzzles that are worth working on, and which have not been
addressed by normal biomedical science. Some of us are engaged in
an experiment to see whether we can fashion research tools that
will help us to understand more, by extending existing methods when
feasible, and developing new ones when appropriate. For us, it is
particularly discouraging to see normal biomedical scientists
perverting their own tools for the evident purpose of attacking
unconventional therapies.
References
1. Aijing Shang, Karin Huwiler-Müntener, Linda Nartey,
Peter Jüni, Stephan Dörig, JonathanA C Sterne, Daniel
Pewsner, Matthias Egger. Are the clinical effects of homoeopathy
placebo effects? Comparative study of placebo-controlled trials of
homoeopathy and allopathy. Lancet 2005; 366: 726–32
2. Miller ER, Pastor-Barriuso R, Dalal D, Riemersma RA,
Appel LJ,Guallar E. Metaanalysis: High-dosage vitamin E
supplementation may increase all-cause mortality. Annals of
Internal Medicine 2005;142(1):37-46
3. Neustadt J, Pizzorno J. Vitamin E and All-Cause
Mortality. Integrative Medicine 2005;4(1):14-17
4. Rosa L, Rosa E, Sarner L, Barrett S. A close look at
therapeutic touch. JAMA 1998;279(13):1005-1010
5. Cox T. A nurse-statistician reanalyzes data from the Rosa
therapeutic touch study. Alternative Therapies 2003;9(1):58-65
Back
|