1. Introduction
Randomized clinical trials (RCTs) of treatments for pain have a long and distinguished
history. The earliest clinical trials not only identified analgesic medications and
their efficacious dosages but also contributed to the development of clinical trial
research designs and methods that came to be used throughout medicine. The ground-breaking
investigators who designed and conducted these early studies recognized that various
sources of bias must be addressed,
68,69,78,105
and appreciation of the fundamental roles of study design and statistical principles
became widespread as experience conducting RCTs grew.
In this article, we first present analyses of a sample of chronic pain trials that
show a decline in treatment effect estimates over the past few decades and discuss
the implications of these results for determining sample sizes for future chronic
pain trials. We then review explanations for the failure of RCTs to demonstrate the
efficacy of truly efficacious treatments and address the role of excessive placebo
group improvement. Finally, we consider various approaches that have the potential
to improve the informativeness of clinical trials and their assay sensitivity, that
is, their ability to distinguish an effective treatment from a less effective or ineffective
treatment.
2. “The greatest teacher, failure is”: falsely negative and inconclusive clinical
trial results
It has been recognized for at least 2 decades that clinical trials of psychiatric
medications often fail to show a statistically significant difference between an active
medication and placebo.
29,53,63,74,82
Although some of these RCTs might have investigated treatments that truly lack efficacy,
many were for medications that had demonstrated efficacy in multiple previous RCTs
and had been approved by regulatory agencies around the world (eg, selective serotonin
reuptake inhibitors for depression). Similarly, many RCTs of treatments for chronic
pain have failed to demonstrate efficacy.
19,22,31
Some of these results also might reflect a true lack of efficacy—either in general
or for the specific dosage studied—but some RCTs have failed to show efficacy of medications
at dosages that had demonstrated efficacy in previous trials, had been approved by
multiple regulatory agencies, and are generally considered first-line treatments.
19,22,31
It is common to refer to clinical trial results that fail to show the efficacy of
truly efficacious treatments as “false negatives.” However, the failure of a clinical
trial to reject the null hypothesis of no difference between an active treatment and
placebo at a prespecified level of statistical significance does not necessarily indicate
that the active treatment lacks efficacy.
86
Such nonsignificant study results can be accompanied by confidence intervals that
are consistent with the possibility of a clinically meaningful treatment effect. When
there is such an outcome, the results of the trial should be considered “inconclusive”
rather than “negative.”
39
A failure to reject the null hypothesis can also be a result of chance, reflected
in the type II error probability of failing to reject the null hypothesis of no difference
between treatment groups when one truly exists.
Table 1 presents a list of potential explanations for the failure of clinical trials
of truly efficacious treatments to show their efficacy (see also Ref. 86). We focus
on the roles of statistical power, excessive improvement in placebo groups, and various
study methods and patient characteristics in contributing to falsely negative and
inconclusive clinical trial outcomes. An additional explanation for such clinical
trial results is the possibility that existing outcome measures have limited responsiveness
to detect treatment effects. Most chronic pain RCTs have used numerical or visual
analogue scales of pain intensity as primary outcome measures,
101
but other measures that could serve as primary outcomes—for example, ratings of pain
relief, global improvement, or disease-specific pain-related symptoms—might have greater
responsiveness.
18,44,97,98,102,109
Furthermore, chronic pain RCTs have typically not been designed to study patients
selected on the basis of genotypes or phenotypes targeted by “precision” or “personalized”
pain treatments. Although we believe that the development of improved clinical outcome
assessments and of mechanism-based treatments
16,25,100
may make important contributions to the identification of pain treatments with greater
efficacy or safety, further discussion of these issues is beyond the scope of this
article.
Table 1
Why can clinical trials of truly efficacious treatments fail to show their efficacy?
1. Chance
2. Placebo group patients improved “too much”
3. The optimal patients and phenotypes were not studied
4. Existing outcome measures have limited responsiveness to treatment effects
5. Temporal changes in characteristics of patients enrolling in trials
6. Temporal changes in types of clinical sites conducting trials
7. Research subject misbehaviour
8. Research site unintentional bias and misconduct
9. Inadequate sample sizes
3. Treatment effects and sample size determination
Twenty years ago, Moore et al.
79
concluded on the basis of a series of simulations that “size is everything” if the
samples of patients enrolled in RCTs are to have adequate statistical power to provide
credible estimates of the efficacy of acute pain treatments. The results of recent
meta-analyses of assay sensitivity and placebo group changes in RCTs of chronic neuropathic
pain have found that treatment effects have decreased and placebo group changes have
increased over the past several decades, perhaps especially in the United States.
Tuttle et al.
110
concluded that from 1990 to 2013, placebo group changes increased while active treatment
group changes remained relatively stable; as a consequence, “treatment advantage”
vs placebo decreased substantially. Figure 1 presents the results of a second recent
meta-analysis, on the basis of which Finnerup et al.
32
concluded that, from 1982 to 2017, there was an increase in mean numbers-needed-to-treat
(NNTs) that was associated with increases in placebo group change, study duration,
and sample size (note that we refer to active and placebo group “changes” rather than
“responses” because the term “responses” fails to encompass regression to the mean,
spontaneous improvement, and other nonspecific sources of improvement or worsening
that are not actual responses to active or placebo treatments).
Figure 1.
Combined number-needed-to-treat (NNT) per year from a meta-analysis of randomized
clinical trials of pharmacologic treatments for chronic neuropathic pain.
32
Although the results of these meta-analyses are generally consistent with what has
been observed for RCTs in major depression
13,53,106,119
and other therapeutic areas,
5,52
only treatments for chronic neuropathic pain were examined and few such analyses have
examined other chronic pain conditions.
24
Nevertheless, the results suggest that factors such as increasing placebo group change
and changes in study methods may be limiting or reducing estimates of the effects
of chronic pain treatments, which would necessitate larger sample sizes for adequate
statistical power to detect minimally clinically important effects.
When planning a clinical trial, appropriate sample size determination is necessary
to avoid exposing more patients than necessary to a potentially nonefficacious or
harmful treatment, while also including a sufficient number of participants to demonstrate
a true treatment effect, if one exists.
26,77
Tuttle et al.
110
presented differences between medications and placebo in the percentage decrease in
pain intensity from baseline, and Finnerup et al.
32
presented NNTs. Such data, however, are of limited value for determining sample sizes
for analyses of continuous pain outcomes, for example, analysis of covariance adjusting
for baseline pain, which is a common primary efficacy analysis used in confirmatory
RCTs of chronic pain treatments.
21
In addition to type I and type II error probabilities—typically prespecified as 5%
and 10% to 20%, respectively—sample size calculations for continuous variables require
specification of the magnitude of the treatment effect and the variability of the
outcome measure. A well-accepted approach to sample size determination for such a
primary efficacy analysis involves the standardized effect size (SES),
26
which for a parallel group RCT is the mean change from baseline in the active group
minus that in the placebo group divided by the pooled SD.
3.1. Methods and results
We examined whether SESs of published neuropathic and non-neuropathic chronic pain
trials have decreased over the past several decades by performing a secondary analysis
of data from a recent meta-analysis of RCTs of efficacious medications conducted from
1980 to 2016 for low back pain, fibromyalgia, osteoarthritis pain, painful diabetic
peripheral neuropathy, and postherpetic neuralgia.
102
The purpose of the initial meta-analysis was to compare the responsiveness of ratings
of average pain intensity (API) and worst pain intensity (WPI), and in the current
analysis, we explored the trajectories of API and WPI SESs over time. Twenty-three
articles were identified for inclusion, with publication dates from 1999 to 2013.
SESs were extracted or calculated using other reported data, and positive values indicate
that the treatment reduced API or WPI more than placebo.
102
Mixed-effects meta-regression was used to test the significance of the relationship
between time and both API SES and WPI SES. Preliminary analysis suggested that the
relationships between time and both API SES and WPI SES were not linear. We therefore
fit quadratic models regressing API SES and WPI SES on time and the square of time,
where time is the number of years from 1999. Four articles included 2 active treatments
compared with the same placebo arm. A robust variance estimator was used to account
for correlations among the dependent effect size estimates in these 4 articles. All
analyses were conducted using R version 3.5.1 with the robust.se function for robust
variance estimation.
46,47
Table 2 presents the parameter estimates for time and the square of time for the API
SES and WPI SES models. Figure 2 shows that API SES and WPI SES both increased slightly
for a short time, but on average, the slopes decreased for every additional year after
1999. These results are consistent with the results of the meta-analyses of neuropathic
pain trials
32,110
and demonstrate that the average benefit of efficacious analgesic medications shown
in recent RCTs is modest. It is unknown whether the SESs for API and WPI will level
off at approximately 0.30 or whether there will be a continued downward slope that
will result in even lower SESs.
Table 2
Parameter estimates for the pain intensity models.
Average pain intensity SES
Worst pain intensity SES
Estimate (95% CI)
P
Estimate (95% CI)
P
Intercept
0.379 (0.308 to 0.451)
<0.0001
0.401 (0.344 to 0.457)
<0.0001
Time
0.036 (−0.002 to 0.074)
0.07
0.027 (−0.007 to 0.060)
0.13
Time2
−0.003 (−0.007 to −0.0003)
0.04
−0.003 (−0.006 to −0.000)
0.06
Time is the number of years since 1999.
CI, confidence interval; SES, standardized effect size.
Figure 2.
Standardized effect sizes for average and worst pain intensity in randomized clinical
trials of chronic pain treatments from 1999 to 2013.
3.2. Implications
The results of our analysis do not address the causes of the decline in the SESs found
in RCTs of efficacious medications for chronic pain. It is possible to speculate that
this decline is due to efforts by the scientific community and government regulators
to increase the rigor of clinical trial design, execution, and analysis through methods
such as comprehensive prespecification of study methodology and analysis, limiting
multiple hypothesis testing unless proper statistical adjustments are used, and principled
methods to accommodate missing data.
41,42,50,51,99,103
Declines in SESs may also result from greater availability of pain treatments over
time, which could reduce the pool of eligible patients and increase the percentage
of study participants who have refractory pain.
22
Given the evidence that expectations are a major source of placebo effects, it is
also possible that placebo group changes increase as evidence for a treatment's efficacy
accumulates and becomes publicly available.
2
One important limitation of the present analyses is that they are based on published
trials of 5 chronic conditions that reported both API and WPI. Although our results
provide some information about the temporal trajectories of SESs from chronic pain
trials, analyses that examine SESs for different chronic pain conditions or that include
a larger sample of RCTs might produce different results; indeed, because clinical
trials with nonsignificant results are less likely to be published, meta-analyses
that include unpublished studies might show even greater declines. In addition, because
the clinical trials we examined were limited to studies of efficacious medications
for chronic pain, analyses of clinical trials of devices (eg, spinal cord stimulators)
or of other nonpharmacologic treatments (eg, cognitive-behavior therapy and physical
therapy) might also produce different results. For example, it has been observed that
treatment effect estimates from RCTs of psychosocial treatments for depression are
generally greater than those from trials of antidepressant medications; this observation
may be explained by attenuation of the antidepressant treatment effect in trials in
which a medication is compared with placebo and both groups are receiving intensive
clinical management, which can be “substantially more therapeutic for patients with
depression than doing nothing.”
90
The mean SES of approximately 0.30 for the most recent published chronic pain trials
mirrors the mean SESs reported in meta-analyses of efficacious antidepressants for
major depression.
43,61,62
Antidepressant trials share with analgesic RCTs several methodologic characteristics
that might contribute to decreased assay sensitivity, including subjective outcomes,
considerable placebo group improvements, and appreciable missing data.
41,61,110
Given the consistent meta-analysis results, it is crucial that analgesic and antidepressant
RCTs be designed with realistic treatment effect estimates. To detect an SES of 0.30
with 80% power (α = 0.05, 2 tailed) in a parallel group trial, at least 175 patients
per group would need to be randomized. An SES of 0.30 can be considered a modest treatment
effect, and its clinical importance will depend on the risks and benefits of the treatment
and its clinical context.
15,20
Such SESs reflect not only the specific effects of the treatments (eg, the pharmacologic
activity of a medication) but also any methodologic characteristics of the clinical
trials that decrease their assay sensitivity.
19,22
In designing chronic pain RCTs, an SES of 0.30 can serve as a benchmark that could
be considered when performing sample size determinations. This approach addresses
both the modest apparent efficacy of existing treatments and any limitations of the
clinical trial methods that have been used to study them. It is important to acknowledge,
however, that it is usually recommended that sample size determination be based on
specifying an effect size that would be of minimal clinical importance to patients,
clinicians, and other stakeholders. Given the often poor tolerability and risks of
many existing treatments, doing so might be challenging because even a minimal treatment
effect could be considered meaningful for a novel treatment that is well tolerated
and safe.
15,20
4. Three eras of analgesic clinical trials
The observation that clinical trials of medications with well-established efficacy
are sometimes unable to demonstrate that efficacy provided the impetus for ongoing
efforts to explain such results by examining associations between the research methods
and patient characteristics of RCTs and their assay sensitivity. As can be seen from
Figure 1, 3 eras of analgesic clinical trials can be identified from the NNTs associated
with pharmacologic treatments for neuropathic pain.
32
The first era—from the early 1980s through the early 1990s—has the lowest NNTs (ie,
greatest treatment vs placebo differences) and consists primarily of relatively small
cross-over trials conducted by investigators such as Mitchell Max, Michael Rowbotham
and Howard Fields, Søren Sindrup, and Peter Watson. These studies were typically conducted
at a single clinical site with patients who were either personally known by the researchers
or carefully assessed by clinician investigators with substantial expertise. The second
era—from the mid-1990s to the mid-2000s—reflects the involvement of pharmaceutical
companies in developing drugs for chronic pain. The early clinical trials of gabapentin,
duloxetine, and pregabalin were conducted at multiple sites but often included investigators
at academic medical centers with experience treating or researching the specific pain
condition being studied. The third era—from the late 2000s to the present—has the
highest NNTs and includes multinational RCTs with large sample sizes using primarily
for-profit clinical research centers that conduct clinical trials across a wide range
of therapeutic areas.
The decrease in treatment effects reflected in these increasing NNTs could be a result
of changes over time in research methods, study sites, and/or the patients enrolled
in the trials.
32
Meta-analyses of RCTs of chronic neuropathic
23,32
and musculoskeletal pain
24
have found that greater trial assay sensitivity was associated with shorter trial
durations and also smaller sample sizes. It is possible, however, that smaller trials
that are negative or inconclusive are less likely to be published, and such publication
bias might contribute to the results of these meta-analyses. Nevertheless, on the
basis of data such as these, it has been suggested that larger and longer trials are
not necessarily better at demonstrating whether a treatment is truly efficacious.
72,88
The decreased treatment effects observed over the past several decades could be a
result of the pharmaceutical industry conducting an increasing number of appropriately
powered RCTs intended to fulfill regulatory requirements for study durations that
can examine durability of treatment effects.
In addition, analyses of RCTs of depression
72
and Parkinson disease
45
have suggested that effect sizes might be smaller for patients who are enrolled later
in the trial than for those enrolled earlier, perhaps due to the enrollment of patients
who do not fulfill eligibility criteria because of pressure on sites to complete enrollment
requirements. Also, with longer trials—for example, durations of 12 weeks or more
rather than 5 to 8 weeks—there may be greater placebo vs active group improvement
resulting from, as discussed in the next section, a greater number of study visits
90
and an increased opportunity for patients to develop supportive relationships with
study staff.
87,91
It is also possible that over the course of these 3 eras of analgesic trials, the
quality of RCT procedures and data, including patient clinical evaluations and outcome
assessments, became more variable as greater numbers of study sites participated.
74
In addition, there has been increasing recognition of the potential roles of unintentional
and intentional investigator bias
64,67,81
and frank research misconduct
27
in contributing to negative, inconclusive, and invalid study results. It has also
become apparent that surprisingly large percentages of the participants enrolled in
clinical trials are either professional subjects who are fabricating a clinical condition—and
may be participating in more than one clinical trial at different sites, so-called
“duplicate patients”—or are patients who intentionally falsify key eligibility criteria
to be randomized.
10,11,76,96
Information provided on social media
71
and clinical trial websites can facilitate enrollment of such unqualified participants,
and methods to identify professional subjects and mitigate patient misbehavior are
now being developed, including the creation of research subject registries.
76,96
5. Placebo group changes and their interpretation
The results of meta-analyses of RCTs have found meaningful relationships between placebo
group changes and study methods and patient characteristics. Paralleling the results
discussed above for treatment effects, greater placebo group changes in neuropathic
pain trials were associated with longer trial durations and larger sample sizes.
19,32,110
In a larger number of meta-analyses of major depression trials, greater placebo group
changes were associated with larger numbers of study sites, larger samples, greater
frequency of study visits, longer trials, lower probability of receiving placebo,
and higher patient expectations for improvement.
29,35,84,87,92,111,118
A robust finding that has emerged from multiple analyses of both pain and psychiatric
treatments are associations between greater magnitudes of placebo group change and
negative or inconclusive clinical trial outcomes, as evaluated, for example, by statistical
significance, risk ratios, and NNTs.
32,52,53,59,110
In considering such relationships, it is important to recognize that random variation
in the magnitudes of placebo group change across a set of RCTs will cause an association
between placebo group changes and treatment effect estimates that reflect the difference
between that placebo group and an active treatment. As Senn
93
observed many years ago, a “negative correlation between odds ratios and placebo rates
in clinical trials does not of itself indicate the presence of a phenomenon of interest.
Such an effect is to be expected on statistical grounds alone and there is thus no
need to search for medical explanations.”
Despite the statistical basis of associations between placebo group changes and treatment
effect estimates, these associations can also reflect characteristics of the clinical
trials that potentially reduce assay sensitivity. For example, it is uncommon for
the mean pain intensity to fall below a mean of 3 or 4 on a 0 to 10 numerical rating
scale. Such a “floor” of symptom reduction may represent an unresponsive core of refractory
pain that if reached by patients in the placebo group would make it difficult to show
any further pain reduction from an efficacious treatment. If this floor effect occurs,
it could account, at least in part, for the associations between greater magnitudes
of placebo group change and decreased treatment effects that have been reported. Assuming
that there is such a floor effect, the separation between an efficacious treatment
and placebo in an RCT might be greater if nonspecific sources of improvement in both
treatment groups—such as placebo effects and regression to the mean—could be reduced,
which could make it less likely that the placebo group would reach the floor.
Another explanation for associations between placebo group changes and treatment effect
estimates involves the presumption of additivity in placebo-controlled clinical trials.
It is generally assumed that the specific effects of an active treatment provide an
additive benefit to the nonspecific effects associated with treatment in the placebo
group, which include placebo effects and regression to the mean. As noted by Kaptchuk,
58
this premise takes “for granted that the active drug response results partly from
a placebo effect and that the placebo effect buried in the active arm is identical
to the placebo effect of the dummy treatment.” But it is possible that response to
the active treatment supplants at least part of the placebo group response, in which
case the specific effects of the active treatment and the non-specific effects of
trial participation, including placebo treatment, would be subadditive, that is, some
of the nonspecific effects that occur in the placebo group would not occur in the
active treatment group.
6,66,73
An example of such subadditivity is provided by Roose et al.,
90
who noted that therapeutic contact with study staff—who have been reported to vary
greatly in what they consider appropriate interactions with study participants
14
—may be “a potent contributor to symptomatic improvement in patients with depression,
particularly patients in the placebo arm” of antidepressant RCTs. In several trials,
number of study visits was more strongly associated with improvement in the placebo
groups than in the antidepressant groups. It was concluded that “increasing the number
of study visits significantly increases placebo response while leaving medication
response generally unaffected,” for example, having only 6 rather than 10 visits over
the course of a 12-week trial was associated with a difference in response rates between
an antidepressant and placebo of 12.2% vs 0.4%.
90
Such differential effects on active and placebo group changes, if indeed causal, could
reduce the apparent benefit of an efficacious treatment when compared with placebo.
Although this subadditivity would decrease the assay sensitivity of any trials in
which it occurs, it does provide a basis for hypothesizing that assay sensitivity
can be increased if study procedures such as excluding certain patients
28
or training study participants
98,108
have differential effects on active and placebo group changes.
6. “Always in motion is the future”: emerging evidence-based approaches to the design
of pain clinical trials
Size does matter when determining the number of participants needed for an RCT to
provide adequate statistical power to identify minimally clinically important effects
26
and to estimate their magnitude.
79
Nevertheless, it is important to recognize that various strategies for increasing
the assay sensitivity of RCTs and decreasing the probability of inconclusive results
should also be considered.
22
6.1. General methodologic considerations
As recently emphasized in the International Council on Harmonisation E9 (R1) addendum
on estimands in clinical trials, the preeminent consideration in designing clinical
trials is to identify the scientific question of interest and the estimand, “a precise
description of the treatment effect reflecting the clinical question posed by the
trial objective.”
49
The choice of estimand determines the clinical trial design and the statistical analysis
plan, including methods for accommodating inter-current events and missing data and
the selection and interpretation of sensitivity analyses.
4,49,85
Discussion of the complex conceptual and statistical issues involved in determining
estimands and prespecifying principled statistical analyses for their estimation is
beyond the scope of this article; however, we believe it is important to emphasize
that biostatisticians with expertise in clinical trials should be involved from the
earliest consideration of conducting a clinical trial and continuing through its design,
execution, analysis, interpretation, and reporting.
The evidence that knowledge of clinical trial eligibility criteria can lead to intentional
and unintentional biases among study staff and potential participants has provided
a basis for recommending that key aspects of the protocol that do not involve safety
should be concealed from all study staff and patients.
21,48,89
Blinding staff and patients to eligibility criteria could reduce the numbers of patients
who are randomized but who do not actually fulfill these criteria because of inflated
or falsified baseline assessments; use of electronic diaries and case report forms
has made implementation of such blinding relatively straightforward. In addition,
blinding study staff and patients to allocation ratios when patients are more likely
to be randomized to active vs placebo treatment (eg, dose finding and active comparator
trials) could also prevent the increases in placebo group improvements that have been
found in trials in which patients know that their chance of receiving placebo is less
than their chance of receiving an active treatment and, presumably as a result, have
greater expectations for improvement.
83
Blinding patients and staff to the allocation ratio requires considerable attention
to the language used in consent forms and patient materials and also involves explaining
to ethics committees the anticipated benefits on assay sensitivity that might result.
An important feature of clinical trials that is receiving increased attention as a
source of poor data quality and of failures to demonstrate the efficacy of truly efficacious
treatments is poor treatment adherence. Poor medication adherence can decrease estimates
of efficacy and confound assessments of safety,
3,9
but it has typically been assessed using pill counts, which are known to be inaccurate.
Although there are now a variety of more sophisticated methods for assessing medication
adherence that have greater validity,
3,96
they have rarely been used in chronic pain RCTs.
Minimizing placebo group changes also has the potential to enhance assay sensitivity.
For example, in neuropathic pain RCTs, the time to onset of pain reduction in placebo
groups has been shown to be longer than that associated with analgesic medications.
110
This is consistent with the observation that longer trials tend to have a progressive
increase in placebo group changes
19,88
and suggests that shorter treatment durations may be preferable for proof-of-concept
trials; of course, RCTs with longer durations would still be necessary to evaluate
the durability of any benefits. In addition, when recruiting potential participants
for a clinical trial evaluating a new treatment, placebo effects should be minimized
by neutrally describing the treatment rather than enhancing participant expectations
about its efficacy.
92,115,120
Placebo group changes might also be reduced by limiting the number of study visits
and standardizing interactions between study staff and participants.
14
Importantly, whether such techniques reduce retention and thereby increase the amount
of missing data should also be considered. Developing methods to mitigate unrealistic
patient expectations is consistent with the obligation to ensure that patients understand
the difference between participating in a clinical trial and receiving clinical care;
any such standardized protocols intended to diminish placebo group improvement would
ideally be evaluated in RCTs designed to examine their effectiveness and any unintended
negative consequences.
6.2. Patient characteristics
Various inclusion and exclusion criteria seem to be associated with increased assay
sensitivity; for example, greater baseline pain intensity and prohibition of concomitant
analgesic medications were found to be associated with greater assay sensitivity in
clinical trials of chronic neuropathic
23,32
and musculoskeletal pain.
24
In addition, analyses of individual patient data showed that the subgroup of patients
with excessive variability of pain ratings at baseline had reduced separation between
the active treatment and placebo.
28,107
One approach to preventing the randomization of patients who do not fulfill eligibility
criteria is to implement a central adjudication process, in which trial eligibility
criteria are reviewed for each potential study patient.
33,75
This approach has the potential to increase the response to efficacious treatments
by eliminating individuals who are unlikely to respond because they do not have the
condition for which the treatment is indicated. Independent adjudication of eligibility
criteria may also decrease placebo group changes by eliminating professional subjects
and others who might be more likely report improvement.
33,76,96
6.3. Research designs
There are several clinical trial designs that have the potential to increase assay
sensitivity and the efficiency of identifying efficacious pain treatments (Table 3).
One relatively straightforward approach is to conduct an interim blinded sample size
re-estimation to ensure that the variability of the primary outcome measure was not
underestimated in the initial sample size determination.
26
Interim futility analyses can also increase the efficiency of identifying efficacious
treatments by determining whether a treatment is very unlikely to be statistically
significantly different from the control treatment at the scheduled end of the trial.
56,104
Although use of such interim analyses in chronic pain RCTs has rarely been reported,
they are routinely implemented in other therapeutic areas, and it has been recommended
that they be considered in the design of clinical trials of pain treatments.
21,22,37
Table 3
Clinical trial designs that can improve the efficiency and informativeness of clinical
trials of pain treatments.
1. Interim blinded sample size re-estimation
2. Interim futility analyses
3. Cross-over and multiple N-of-1 designs
4. Designs that might have greater assay sensitivity (eg, EERW, SPCD, and TED)
5. Adaptive designs
6. Master protocols, including, basket, umbrella, and platform designs
EERW, enriched enrollment randomized withdrawal; SPCD, sequential parallel comparison
design; TED, two-way enriched design.
Cross-over designs can be used to reduce sample size requirements when studying pain
conditions that are expected to remain stable throughout the trial duration and treatments
that have relatively fast onset and offset of their pharmacodynamic effects.
26,40,94
However, cross-over trials also have several potential limitations, including carry-over
effects, in which the effect of an active treatment in the first period may carry
over to a placebo condition in the next period and reduce the second period treatment-placebo
difference. Various methods for addressing these effects have been proposed, but the
best approach is to design the trial to minimize potential carry-over effects and
any other causes of treatment-by-period interaction.
26,94
When a cross-over trial randomizes patients to at least 2 periods with an active treatment
and 2 periods with placebo—also referred to as an N-of-1 design when used in clinical
practice
65
—it becomes possible to examine whether there is evidence of treatment-by-patient
interaction.
17,38
Significant treatment-by-patient interaction indicates that there is heterogeneity
of treatment effects among patients, that is, different patients truly respond differently
to the treatment. Multiperiod cross-over trials, therefore, have the potential to
identify those pain conditions and treatments for which efforts to determine genotypic
and phenotypic predictors of treatment response could be worthwhile.
95
Enrichment designs may increase clinical trial assay sensitivity by randomizing those
patients who are expected to be more likely to respond to treatment and not withdraw
because of adverse events.
114
The most common type of enrichment design used in studying chronic pain treatments
has been termed “enriched enrollment randomized withdrawal.”
60,80
In this design, an initial enrichment phase in which patients receive the active treatment
is followed by a double-blind phase in which patients who have tolerated the treatment
and reported an improvement in pain intensity are randomized to continued active treatment
or to placebo. The results of published trials suggest that the assay sensitivity
of these trials may be greater than the assay sensitivity of standard parallel group
trials, but the evidence is not conclusive.
34,60,80
The sequential parallel-comparison design (SPCD) was developed to reduce placebo group
improvements and thereby increase assay sensitivity in RCTs of antidepressant medications.
29,30
In the most common version, patients are first randomized to active treatment and
placebo groups, typically with more participants allocated to placebo. Patients in
the placebo group who do not improve in this phase are then rerandomized to either
the active treatment or placebo. The efficacy analysis typically includes all first-phase
data and second-phase data only from the placebo group patients who did not improve
in the first phase. Because some patients contribute outcome data from both phases
and there is typically a reduced magnitude of change in the placebo group in the second
phase, SPCD trials can reduce required sample sizes.
12,29,54
The potential of this design for increasing the assay sensitivity of RCTs of chronic
pain treatments has been discussed.
37
A two-way enriched design that is an extension of SPCD has also been described.
55
In this design, after randomization to either active or placebo treatment, patients
in the active treatment group who improved and patients in the placebo group who did
not improve are rerandomized to active or placebo treatment. The data from this second
phase make it possible to test whether a treatment “that is significantly superior
to placebo in achieving short-term efficacy will also be superior to placebo in the
maintenance of efficacy.”
55
Adaptive clinical trial designs can be used for exploratory studies as well as for
confirmatory trials, and their objectives have included (1) dose finding; (2) bridging
phases 1 and 2 or phases 2 and 3 with seamless designs (eg, using dose-finding data
to transition to a confirmatory trial); (3) response adaptive randomization to increase
the percentage of patients randomized to treatments with promising interim data; and
(4) interim sample-size re-estimation and futility analyses, as discussed above.
7,26,112
The benefits of adaptive designs can include smaller sample sizes, shorter durations,
and an increased likelihood of achieving trial objectives. However, operational challenges
include extensive simulation studies often required for study planning, medication
supply, and monitoring of sites, data, and analyses.
36
Although it has been suggested that adaptive dose-finding designs can play an important
role in early analgesic drug development,
57
there have been very few published RCTs of pain treatments that have used adaptive
designs.
There has recently been considerable attention to the potential of master protocols
to increase the efficiency of drug development by using “a single infrastructure,
trial design, and protocol to simultaneously evaluate multiple drugs and/or disease
populations in multiple substudies.”
113
There are 3 different types of master protocols: (1) umbrella trials, in which multiple
treatments are studied for a single disease; (2) basket trials, in which a single
treatment is studied in multiple diseases or multiple subtypes of a single disease;
and (3) platform trials, in which multiple treatments are studied for a single disease,
as in umbrella trials, but in a perpetually continuing manner, and often with sharing
of common control patients, treatments entering and exiting the platform on the basis
prespecified decision algorithms, and early stopping for success or failure.
116
The most frequent use of master protocols has been in oncology, in which different
designs have been used to study novel drugs and drug combinations, often in biomarker-defined
subgroups of patients. Master protocols could have particular value for novel treatments
that potentially have efficacy in one or more different pain conditions given the
prevailing expectation that predictive biomarkers will be developed that can identify
subgroups of patients who respond more robustly to treatment,
1,117
7. Discussion
Sample size does matter for ensuring that clinical trials of pain treatments have
adequate assay sensitivity, but it is not everything. In designing, conducting, and
analyzing RCTs, a large number of additional methodologic issues and advances should
also be considered. Unfortunately, very few studies have formally examined whether
modifying study methods increases assay sensitivity or decreases placebo group changes
in RCTs of efficacious pain treatments. Providing preliminary support for the value
of patient training, the results of recent studies in which patients were randomized
to training or no training showed that training can improve the accuracy of pain ratings
98,108
; in addition, placebo group changes were reduced and there were numerically greater
effect sizes in trained vs untrained patients in one of these studies, a clinical
trial in painful diabetic peripheral neuropathy.
108
The ultimate objective of the research discussed in this article is to develop an
evidence-based approach to the design of clinical trials,
19
and prospective RCTs must be conducted to test methods that are hypothesized to increase
assay sensitivity. Nevertheless, on the basis of available evidence as well as general
considerations involving study execution and data quality, recommendations have been
presented for improving the design of acute
8
and chronic
21,37
pain trials and for increasing their assay sensitivity.
22
Adopting such recommendations and giving careful consideration to optimizing study
design has the potential to increase the assay sensitivity and informativeness of
RCTs of pain treatments. The results of the clinical trials conducted over the next
decade will hopefully demonstrate whether these approaches give rise to a fourth era
of analgesic clinical trials, one in which meaningful increases in treatment effects
will occur.
8. Summary
There is no better summary of our perspective on the current state of pain treatment
than one provided by Paul Leber
70
for psychiatric medications. Based on his wide-ranging experiences as director of
the U.S. Food and Drug Administration's Division of Neuropharmacologic Drug Products,
Leber maintained that “given how little we actually understand about the behaviors
and affects we seek to manage through pharmacological interventions…we are exceedingly
fortunate to possess the number of modestly effective drugs that we do.”
Conflict of interest statement
S.M. Smith has received in the past 36 months a research grant from the Richard W.
and Mae Stone Goode Foundation. For a complete list of lifetime disclosures for M.
Fava, please see: https://mghcme.org/faculty/faculty-detail/maurizio_fava. M.P. Jensen
has received in the past 36 months research grants from the U.S. National Institutes
of Health, the U.S. Department of Education, the Administration of Community Living,
the Patient-Centered Outcomes Institute, and National Multiple Sclerosis Society,
the International Association for the Study of Pain, and the Washington State Spinal
Injury Consortium, and compensation for consulting from Goalistics. O. Mbowe has no
disclosures for the past 36 months. M.P. McDermott has been supported in the past
36 months by research grants from the U.S. National Institutes of Health, U.S. Food
and Drug Administration, NYSTEM, SMA Foundation, Cure SMA, and PTC Therapeutics, has
received compensation for consulting from Neuropore Therapies and Voyager Therapeutics,
and has served on Data and Safety Monitoring Boards for U.S. National Institutes of
Health, Novartis Pharmaceuticals Corporation, AstraZeneca, Eli Lilly, aTyr Pharma,
Catabasis Pharmaceuticals, Vaccinex, Cynapsus Therapeutics, and Voyager Therapeutics.
In the past 36 months, D.C. Turk has received research grants and contracts from U.S.
Food and Drug Administration and U.S. National Institutes of Health, and compensation
for consulting on clinical trial and patient preferences from AccelRx, Eli Lilly,
GlaxoSmithKline, Nektar, Novartis, and Pfizer. R.H. Dworkin has received in the past
36 months research grants and contracts from U.S. Food and Drug Administration and
U.S. National Institutes of Health, and compensation for serving on advisory boards
or consulting on clinical trial methods from Abide, Acadia, Adynxx, Analgesic Solutions,
Aptinyx, Aquinox, Asahi Kasei, Astellas, AstraZeneca, Biogen, Biohaven, Boston Scientific,
Braeburn, Celgene, Centrexion, Chromocell, Clexio, Concert, Decibel, Dong-A, Editas,
Eli Lilly, Eupraxia, Glenmark, Grace, Hope, Immune, Lotus Clinical Research, Mainstay,
Merck, Neumentum, Neurana, NeuroBo, Novaremed, Novartis, Olatec, Pfizer, Phosphagenics,
Quark, Reckitt Benckiser, Regenacy (also equity), Relmada, Sanifit, Scilex, Semnur,
Sollis, Teva, Theranexus, Trevena, Vertex, and Vizuri.