Forbes
Larry Husten
November 25, 2012
Editor’s note: Below are two responses to Robert
Schneider’s defense of
his Transcendental Meditation paper, which Schneider wrote in
response to my earlier
article about the publication of his paper. In the first
part I respond to some of the general issues raised by Schneider. The second part,
from Sanjay Kaul, addresses the statistical issues discussed by Schneider.
I’m grateful for Kaul’s highly technical analysis of the
statistical issues raised by Schneider, but I don’t think this case really
requires a terribly high level of technical expertise. Common sense actually
works pretty well in this case. A trial with barely 200 patients cannot be
expected to provide broad answers about the health benefits of a novel
intervention. As Kaul and others have stated on many other occasions, “extraordinary
claims require extraordinary evidence,” and it is quite clear that the evidence
in this trial is not extraordinary, at least in any positive sense.
Questions About Trial Reliability And Data– In his
response Schneider tries to skate away from the inevitable questions raised
about this paper when Archives of Internal Medicine chose to withdraw the
paper only 12 minutes before its scheduled publication time. Schneider can
pretend that this incident never occurred, but outsider readers can not help
but wonder what sparked this extraordinary incident, and will not be satisfied
until the details are fully explained.
There are additional red flags about the trial. Schneider
told WebMD that
since the Archives incident “the data was re-analyzed. Also, new data
was added and the study underwent an independent review.” Said Schneider:
This is an extraordinary claim, because a clinical trial
cannot be “new and improved” unless there were serious flaws with the earlier
version. What exactly does it mean to say that a paper published in 2012
about a trial completed in 2007 is “new and improved”? (According to ClinicalTrials.Gov the
study was completed in July 2007, while June 2007 was the “final data
collection date” for the primary endpoint.)
The 5-year delay between the 2007 completion date and the
publication of the data is highly suspicious. What exactly caused this delay?
The paper hints at one possible source of delay: as Kaul notes below, the
investigators refer to the primary endpoint as a “DSMB-approved endpoint.” This
suggests that the primary endpoint was changed at some point in the trial. As
Kaul points out, it is not the job of the DSMB to either choose or approve
primary endpoints. Since the trial was not registered until 2011 with
ClinicalTrials.Gov it is impossible to sort this issue out unless the
investigators choose to release the initial trial protocol and statistical
plan.
Schneider’s response also fails to explain why there is a
difference in the number of primary endpoint events between the Archives paper
and the Circulation: Cardiovascular Quality & Outcomes paper, since
the collection date for the primary outcome measure is listed as June 2007 on ClinicalTrials.Gov.
I see no reason why the reason for this discrepancy shouldn’t be explained.
Although the difference is only 1 event, it inevitably raises questions about
the reliability of the data.
Trial Interpretation– Finally, I am deeply concerned
about the way this trial will be used, or misused, to “sell” the brand of
Transcendental Meditation in the broadest possible population, ie, everyone.
Though the study was limited to African-American with heart disease,
here’s what Schneider
told the Daily Mail:
‘Transcendental meditation may reduce heart disease risks for
both healthy people and those with diagnosed heart conditions. The research on
transcendental meditation and cardiovascular disease is established well enough
that doctors may safely and routinely prescribe stress reduction for their
patients with this easy to implement, standardised and practical programme.’
Meditation may of course be beneficial, but it will
never be a cure for heart disease, and it won’t replace other treatments. But
here’s what Schneider
told WebMD:
“What this is saying is that mind-body interventions can have
an effect as big as conventional medications, such as statins,” says Schneider.
It shouldn’t be necessary to say, but the evidence base for
statins is several orders of magnitude greater than the evidence base for
meditation. Further, there have been no studies comparing meditation to
statins. Any claim that meditation is equivalent to statins is preposterous.
To be clear, I have nothing against meditation. Generic
meditation is cheap, safe, and even possibly effective. Branded Transcendental
Meditation, on the other hand, is a cult, and it is out to get your money.
An initial TM program costs $1500, and increases the deeper you get
pulled into the cult. Here’s what Schneider
told Healthday:
“One of the reasons we did the study is because insurance and
Medicare calls for citing evidence for what’s to be reimbursed,” Schneider
said. “This study will lead toward reimbursement. That’s the whole idea.”
Here’s the real source of my discomfort with this trial. For
true believers like Schneider, fighting heart disease is important only insofar
as it can be employed to further the interests of TM. Scientific standards and
medical progress are unimportant in the larger scheme of promoting TM.
Read the comments left
by Michael Jackson and Chrissy on my earlier post to learn more
about the dangers of TM. Or do your own research on the internet.
Here’s Sanjay Kaul’s response:
Power calculation
By convention, the difference that the study is powered to detect
(delta) varies inversely with the seriousness of the outcome, i.e., larger
delta for ‘softer’ outcomes and smaller delta for ‘harder’ outcomes. This does
not appear to be the case in the current study.
For the first phase of the trial, the power calculation was based on a 36% risk reduction in death, nonfatal MI, nonfatal stroke, rehospitalization or revascularization (the original primary endpoint).
Then, for the 2nd phase of the trial, the power calculation is based on a 50% reduction in a narrower but harder outcome of death, nonfatal MI, nonfatal stroke (the revised primary endpoint). I find it curious that the authors justify their choice of the revised primary endpoint as ‘DSMB-approved endpoint’! Since when is the DSMB charged with choosing or approving trial endpoints?
For the first phase of the trial, the power calculation was based on a 36% risk reduction in death, nonfatal MI, nonfatal stroke, rehospitalization or revascularization (the original primary endpoint).
Then, for the 2nd phase of the trial, the power calculation is based on a 50% reduction in a narrower but harder outcome of death, nonfatal MI, nonfatal stroke (the revised primary endpoint). I find it curious that the authors justify their choice of the revised primary endpoint as ‘DSMB-approved endpoint’! Since when is the DSMB charged with choosing or approving trial endpoints?
Incidentally, the Proschan-Hunsberger method refers to
conditional, not unconditional, power. To compute conditional power, the
investigators had to have looked at data by arm. Thus, some penalty should be
paid for the ‘interim look’ in the form of requiring a larger z-score (lower p
value) to claim statistical significance. They did not appear to do this.
Strength of evidence
The conventional frequentist approach relies heavily on the p
value which tends to overstate the strength of association. Complementary
approaches such as the Bayesian inference are available that utilize Bayes
factor, a more desirable metric to quantify the strength of evidence compared
with p value. For instance, the Bayes factor associated with a p value of 0.03
(observed in the trial) is about 10, which means that at a prior null
probability of 50%, there is still a 10% chance of null probability based on
the trial results, more than 3-fold higher than that implied by a p value of
0.03. So the evidence falls in the category of at most ‘moderate’ strength
against the null.
Another way of assessing the strength of evidence is to
quantify the probability of repeating a statistically significant result, the
so-called ‘replication probability’. The replication probability associated
with a p value of 0.03 is about 58% which is unlikely to pass the muster of any
regulatory agency.
The FDA regulatory standard for drug approval is ‘substantial evidence’ of effectiveness based on ‘adequate and well-controlled investigations’ which translates into 2 trials, each with a p value of 0.05. At the heart of this standard (or any scientific endeavor) is replication. The replication probability for 1 trial with a p value < 0.05 is only about 50%; replication probability of 2 trials with p value <0.05 is about 90%. In 1997 the rules were changed to base approval on the basis of a statistically persuasive result obtained in 1 trial, i.e., p value <0.001 for a mortality or a serious irreversible morbidity endpoint. The p value of 0.001 is equivalent to 2 trials with 1-sided p value of 0.025 (0.025 x 0.025 = 0.000625 or 0.001). Thus, the current trial results do not comport with ‘substantial’ or ‘robust’ evidence.
The FDA regulatory standard for drug approval is ‘substantial evidence’ of effectiveness based on ‘adequate and well-controlled investigations’ which translates into 2 trials, each with a p value of 0.05. At the heart of this standard (or any scientific endeavor) is replication. The replication probability for 1 trial with a p value < 0.05 is only about 50%; replication probability of 2 trials with p value <0.05 is about 90%. In 1997 the rules were changed to base approval on the basis of a statistically persuasive result obtained in 1 trial, i.e., p value <0.001 for a mortality or a serious irreversible morbidity endpoint. The p value of 0.001 is equivalent to 2 trials with 1-sided p value of 0.025 (0.025 x 0.025 = 0.000625 or 0.001). Thus, the current trial results do not comport with ‘substantial’ or ‘robust’ evidence.
Distribution of endpoints
It seems highly unusual that 80% of the primary events were
fatal. If true, it means that the subjects were dying either from a non- MI-,
non-stroke-related events such as sudden cardiac death or heart-failure death
(as in patients with advanced heart failure) or non-cardiovascular events not
accounted for by the adjudication process.
Adjusted analyses
Although many have discussed how adjusting for baseline
covariates in the analysis of RCTs can improve the power of analyses of
treatment effect and account for any imbalances in baseline covariates, the
debate on whether this practice should be carried out remains unresolved. Many
recommend that the analysis should be undertaken only if the methods of
analysis and choice of covariates are pre-specified in the protocol or
statistical analysis plan. This is not easily discernible without registration
of clinical trials.