Factors That Can Affect the External Validity of Randomised Controlled Trials

Rothwell, Peter M.

doi:10.1371/journal.pctr.0010009

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Factors That Can Affect the External Validity of Randomised Controlled Trials

other

Author(s): Peter M Rothwell

Publication date (Electronic): 19 May 2006

Journal: PLoS Clinical Trials

Publisher: Public Library of Science

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Randomised controlled trials (RCTs) must be internally valid (i.e., design and conduct must eliminate the possibility of bias), but to be clinically useful, the result must also be relevant to a definable group of patients in a particular clinical setting (i.e., they must be externally valid). Lack of external validity is the most frequent criticism by clinicians of RCTs, systematic reviews, and guidelines, and is one explanation for the widespread underuse in routine practice of many treatments that have been shown to be beneficial in trials and are recommended in guidelines [1]. Yet medical journals, funding agencies, ethics committees, the pharmaceutical industry, and governmental regulators seem to give external validity a low priority. Admittedly, whereas the determinants of internal validity are intuitive and can generally be worked out from first principles, understanding of the determinants of the external validity of an RCT requires clinical rather than statistical expertise, and often depends on a detailed understanding of the particular clinical condition under study and its management in routine clinical practice. However, reliable judgments about the external validity of RCTs are essential if treatments are to be used correctly in as many patients as possible in routine clinical practice. The results of RCTs or systematic reviews will never be relevant to all patients and all settings, but they should be designed and reported in a way that allows clinicians to judge to whom the results can reasonably be applied. Table 1 lists some of the important potential determinants of external validity, each of which is reviewed briefly below. Many of the considerations will only be relevant in certain types of trials, for certain interventions, or in certain clinical settings, but they can each sometimes undermine external validity. Moreover, the list is not exhaustive and requires more detailed annotation and explanation than is possible in this short review. Some of the issues that determine external validity are relevant to the distinction between pragmatic trials and explanatory trials [2], but it would be wrong to assume that pragmatic trials necessarily have greater external validity than explanatory trials. For example, broad eligibility criteria, limited collection of baseline data, and inclusion of centres with a range of expertise and differing patient populations have many advantages, but they can also make it very difficult to generalise the overall average effect of treatment to a particular clinical setting. The Setting of the Trial A detailed understanding of the setting in which a trial is performed, including any peculiarities of the health-care system in particular countries, can be essential in judging external validity. The potential impact of differences between health-care systems is illustrated by the analysis of the results of the European Carotid Surgery Trial (ECST) [3], an RCT of endarterectomy for recently symptomatic carotid stenosis, in Figure 1. National differences in the speed with which patients were investigated, with a median delay from last symptoms to randomisation of greater than two months in the United Kingdom (slow centres) compared with three weeks in Belgium and Holland (fast centres), resulted in very different treatment effects in these different health-care systems—due to the shortness of the time window for effective prevention of stroke. Similar differences in performance between health-care systems will exist for other conditions, and there is, of course, the broader issue of how trials done in the developed world apply in the developing world. Moreover, other differences between countries in the methods of diagnosis and management of disease—which can be substantial—or important racial differences in pathology and natural history of disease also affect the external validity of RCTs. A good example is the heterogeneity of results of trials of bacilli calmette guerin vaccination in prevention of tuberculosis, with a progressive loss of efficacy (p < 0.0001) and with decreasing latitude [4]. How centres and clinicians were selected to participate in trials is seldom reported, but can also have important implications for external validity. For example, the Asymptomatic Carotid Artery Study (ACAS) trial of endarterectomy for asymptomatic carotid stenosis only accepted surgeons with an excellent safety record, rejecting 40% of applicants initially, and subsequently barring from further participation those who had adverse operative outcomes in the trial. The benefit from surgery in ACAS was due in major part to the consequently low operative risk [5]. A meta-analysis of 46 surgical case series that published operative risks during the five years after ACAS found operative mortality to be eight times higher and the risk of stroke and death to be about three times higher [1]. Trials should not include centres that do not have the competence to treat patients safely, but selection should not be so exclusive that the results cannot be generalised to routine clinical practice. Selection and Exclusion of Patients Concern is often expressed about highly selective trial eligibility criteria, but there are often several earlier stages of selection that are rarely recorded or reported but which can be more problematic. For example, consider a trial of a new blood pressure–lowering drug, which like most such trials is performed in a hospital clinic. Fewer than 10% of patients with hypertension are managed in hospital clinics, and this group will differ from those managed in primary care. Moreover, only one of the ten physicians who see hypertensive patients in this particular hospital is taking part in the trial, and this physician mainly sees young patients with resistant hypertension. Thus, even before any consideration of eligibility or exclusion criteria, potential recruits are already very unrepresentative of patients in the local community. It is essential therefore that, where possible, trials record and report the pathways to recruitment. Patients are then further selected according to trial eligibility criteria. Some RCTs exclude women and many exclude the elderly and/or patients with common comorbidities. One review of 214 drug trials in acute myocardial infarction (MI) found that over 60% excluded patients aged over 75 years [6], despite the fact that over 50% of MIs occur in this older age group. A review of 41 United States National Institutes of Health RCTs found an average exclusion rate of 73% [7], but rates can be much higher. One study of the eligibility criteria of an acute stroke treatment trial found that of the small proportion of patients admitted to hospital in time to be suitable for treatment, 96% were ineligible based on the various other exclusion criteria [8]. One centre in another acute stroke trial had to screen 192 patients over two years to find an eligible patient [9]. Yet, highly selective recruitment is not inevitable. The GISSI-1 trial of thrombolysis for acute MI, for example, recruited 90% of patients admitted within 12 hours of the event with a definite diagnosis and no contraindications [10]. Strict eligibility criteria can limit the external validity of RCTs, but physicians should at least be able to select similar patients for treatment in routine practice. Unfortunately, however, reporting of trial eligibility criteria is frequently inadequate. A review of trials leading to clinical alerts by the US National Institutes of Health revealed that of an average of 31 eligibility criteria, only 63% were published in the main trial report and only 19% in the clinical alert [11]. Inadequate reporting is also a major problem in secondary publications, such as systematic reviews and clinical guidelines, where the need for a succinct message does not usually allow detailed consideration of the eligibility and exclusion criteria or other determinants of external validity. Prerandomisation run-in periods are also often used to select or exclude patients. In a placebo run-in, all eligible patients receive placebo, and those who are poorly compliant are excluded. There can be good reasons for doing this, but high rates of exclusion will reduce external validity. Active treatment run-in periods in which patients who have adverse events or show signs that treatment may be ineffective are excluded are more likely to undermine external validity. For example, two RCTs of carvedilol, a vasodilatory beta-blocker, in chronic heart failure excluded 6% and 9% of eligible patients in treatment run-in periods—mainly because of worsening heart failure and other adverse events, some of which were fatal [1]. In both trials, the complication rates in the subsequent randomised phase were much lower than in the run-in phase. Trials also sometimes actively recruit patients who are likely to respond well to treatment (often termed “enrichment”). For example, some trials of antipsychotic drugs have selectively recruited patients who have previously had a good response to antipsychotics [1]. Other trials have excluded nonresponders in a run-in phase. One RCT of a cholinesterase inhibitor, tacrine, in Alzheimer disease recruited 632 patients to a six-week “enrichment” phase in which they were randomised to different doses of tacrine versus placebo [12]. After a washout period, only the 215 (34%) patients who had a measured improvement on tacrine in the “enrichment” phase were randomised to tacrine (at their best dose) versus placebo in the main phase of the trial. External validity is clearly undermined here. Characteristics of Randomised Patients Even in large pragmatic trials with very few exclusion criteria, recruitment of less than 10% of potentially eligible patients in participating centres is common. Those patients who are recruited generally differ from those who are eligible but not recruited in terms of age, sex, race, severity of disease, educational status, social class, and place of residence. The outcome in patients included in RCTs is also usually better than those not in trials, often markedly so, not because of better treatment but because of a better baseline prognosis. Trial reports usually include the baseline clinical characteristics of randomised patients, so it is argued that clinicians can assess external validity by comparison with their patients. However, recorded baseline clinical characteristics often say very little about the real makeup of the trial population, and can sometimes be misleading. For example, Table 2 shows the baseline clinical characteristics of patients randomised to warfarin in two RCTs of secondary prevention of stroke [1]. In one trial, patients were in atrial fibrillation, and in the other they were in sinus rhythm, but the characteristics of the two cohorts were otherwise fairly similar. However, the risk of intracranial haemorrhage on warfarin was 19 times higher (p < 0.0001) in Stroke Prevention in Reversible Ischaemia Trial (SPIRIT) than in the European Atrial Fibrillation Trial (EAFT), even after adjustment for differences in baseline clinical characteristics and the intensity of anticoagulation [13]. In judging external validity, an understanding of how patients were referred, investigated, and diagnosed (i.e., their pathway to recruitment), as well as how they were subsequently selected and excluded, is often much more informative than a list of baseline characteristics. The Intervention, Control Treatment, and Pre-trial or Nontrial Management External validity can also be affected if trials have protocols that differ from usual clinical practice. For example, prior to randomisation in the RCTs of endarterectomy for symptomatic carotid stenosis, patients had to be diagnosed by a neurologist and have conventional arterial angiography, neither of which are routine in many centres. The trial intervention itself may also differ from that used in current practice, such as in the formulation and bioavailability of a drug, or the type of anaesthetic used for an operation. The same can be true of the treatment in the control group in a trial, which may use a particularly low dose of the comparator drug, or fall short of best current practice in some other way. External validity can also be undermined by too stringent limitations on the use of nontrial treatments. Any prohibition of nontrial treatments should be reported in the main trial publications, along with details of relevant nontrial treatments that were used. The timing of many interventions is also critical and should be reported when relevant. Outcome Measures and Follow-Up The external validity of an RCT also depends on whether the outcomes were clinically relevant. Many trials use “surrogate” outcomes, usually biological or imaging markers that are thought to be indirect measures of the effect of treatment on clinical outcomes. However, as well as being of questionable clinical relevance, surrogate outcomes are often misleading. There are many examples of treatments that have had a major beneficial effect on a surrogate outcome, which had previously been shown to be correlated with a relevant clinical outcome in observational studies, but where the treatments have proved ineffective or harmful in subsequent large RCTs that used these same clinical outcomes [1]. Complex scales, often made up of arbitrary combinations of symptoms and clinical signs, are also problematic. A review of 196 RCTs in rheumatoid arthritis identified more than 70 different outcome scales [14]. More worryingly, a review of 2,000 RCTs in schizophrenia identified 640 scales—many of which were devised for the particular RCT and had no supporting data on validity or reliability, but which were more likely to show statistically significant treatment effects than established scales [15]. Moreover, the clinical meaning of apparent treatment effects (e.g., a 2.7-point mean reduction in a 100-point outcome scale made up of various symptoms and signs) is usually impossible to discern. Simple clinical outcomes usually have most external validity, but, even then, only if they reflect the priorities of patients. For example, patients with epilepsy are much more interested in the proportion of individuals rendered free of seizures in RCTs of anticonvulsants than they are in changes in mean seizure frequency. Identifying who actually measured the outcome can also be important. For example, the recorded operative risk of stroke due to carotid endarterectomy is highly dependent on whether patients were assessed by a surgeon or a neurologist [16]. Many trials combine events in their primary outcome measure. This can produce a useful measure of the overall effect of treatment on all the relevant outcomes, and it usually affords greater statistical power, but the outcome that is most important to a particular patient may be affected differently by treatment than the combined outcome. Composite outcomes also sometimes combine events of very different severity, and treatment effects can be driven by the least important outcome, which is often the most frequent. Equally problematic is the composite of definite clinical events and episodes of hospitalisation. The fact that a patient is in an RCT will probably affect the likelihood of hospitalisation, and it will certainly vary between different health-care systems. Another major problem for the external validity of RCTs is an inadequate duration of treatment and/or follow-up. For example, although patients with refractory epilepsy or migraine require treatment for many years, most RCTs of new drugs look at the effect of treatment for only a few weeks. Whether initial response is a good predictor of long-term benefit is unknown. The same problem has been identified in RCTs in schizophrenia, with fewer than 50% of trials having greater than six-week follow-up, and only 20% following patients for longer than six months [17]. The contrast between beneficial effects of treatments in short-term RCTs and the less encouraging experience of long-term treatment in clinical practice has also been highlighted by clinicians treating patients with rheumatoid arthritis [18]. Adverse Effects of Treatment Reporting of adverse effects of treatment in RCTs and systematic reviews is often poor. In a review of 192 pharmaceutical trials, less then a third had adequate reporting of adverse clinical events or laboratory toxicology [19]. Treatment discontinuation rates provide some guide to tolerability, but pharmaceutical trials often use eligibility criteria and run-in periods to exclude patients who might be prone to adverse effects. Clinicians are usually most concerned about external validity of RCTs of potentially dangerous treatments. Complications of medial interventions are a leading cause of death in developed countries. Risks can be overestimated in RCTs, particularly during the introduction of new treatments when trials are often done in patients with very severe disease, but stringent selection of patients, confinement to specialist centres, and intensive safety monitoring usually lead to lower risks than routine clinical practice. RCTs of warfarin in nonrheumatic atrial fibrillation are good examples. All trials reported benefit with warfarin, but complication rates were much lower than in routine practice, and consequent doubts about external validity are partly to blame for major underprescribing of warfarin, particularly in the elderly [1]. CONCLUSIONS Some trials have excellent external validity, but many do not, particularly some of those performed by the pharmaceutical industry. Yet researchers, funding agencies, ethics committees, medical journals, and governmental regulators all neglect proper consideration of external validity. Judgment is left to clinicians, but reporting of the determinants of external validity in trial publications, and particularly in secondary reports and clinical guidelines, is rarely adequate and much relevant information is never published. RCTs cannot be expected to produce results that are directly relevant to all patients and all settings, but to be externally valid they should at least be designed and reported in a way that allows clinicians to judge to whom they can reasonably be applied. A consensus is required on how the design and reporting of trials could be improved in order to achieve this aim. Agreement on a list of the most important issues that should be considered by clinicians and researchers would be a helpful first step.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Endarterectomy for asymptomatic carotid artery stenosis. Executive Committee for the Asymptomatic Carotid Atherosclerosis Study.

JF Toole, M Goldstein, JR Marler … (1995)

To determine whether the addition of carotid endarterectomy to aggressive medical management can reduce the incidence of cerebral infarction in patients with asymptomatic carotid artery stenosis. Prospective, randomized, multicenter trial. Thirty-nine clinical sites across the United States and Canada. Between December 1987 and December 1993, a total of 1662 patients with asymptomatic carotid artery stenosis of 60% or greater reduction in diameter were randomized; follow-up data are available on 1659. At baseline, recognized risk factors for stroke were similar between the two treatment groups. Daily aspirin administration and medical risk factor management for all patients; carotid endarterectomy for patients randomized to receive surgery. Initially, transient ischemic attack or cerebral infarction occurring in the distribution of the study artery and any transient ischemic attack, stroke, or death occurring in the perioperative period. In March 1993, the primary outcome measures were changed to cerebral infarction occurring in the distribution of the study artery or any stroke or death occurring in the perioperative period. After a median follow-up of 2.7 years, with 4657 patient-years of observation, the aggregate risk over 5 years for ipsilateral stroke and any perioperative stroke or death was estimated to be 5.1% for surgical patients and 11.0% for patients treated medically (aggregate risk reduction of 53% [95% confidence interval, 22% to 72%]). Patients with asymptomatic carotid artery stenosis of 60% or greater reduction in diameter and whose general health makes them good candidates for elective surgery will have a reduced 5-year risk of ipsilateral stroke if carotid endarterectomy performed with less than 3% perioperative morbidity and mortality is added to aggressive management of modifiable risk factors.

0 comments Cited 253 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Variation in protection by BCG: implications of and for heterologous immunity.

P E Fine (1995)

0 comments Cited 220 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Unpublished rating scales: a major source of bias in randomised controlled trials of treatments for schizophrenia.

B M Fenton, Andrew C. Adams, Diana M Marshall … (2000)

A recent review suggested an association between using unpublished scales in clinical trials and finding significant results. To determine whether such an association existed in schizophrenia trials. Three hundred trials were randomly selected from the Cochrane Schizophrenia Group's Register. All comparisons between treatment groups and control groups using rating scales were identified. The publication status of each scale was determined and claims of a significant treatment effect were recorded. Trials were more likely to report that a treatment was superior to control when an unpublished scale was used to make the comparison (relative risk 1.37 (95% CI 1.12-1.68)). This effect increased when a 'gold-standard' definition of treatment superiority was applied (RR 1.94 (95% CI 1.35-2.79)). In non-pharmacological trials, one-third of 'gold-standard' claims of treatment superiority would not have been made if published scales had been used. Unpublished scales are a source of bias in schizophrenia trials.

0 comments Cited 142 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): PLoS Clin Trials

Journal ID (publisher-id): pctr

Title: PLoS Clinical Trials

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1555-5887

Publication date Collection: May 2006

Publication date (Electronic): 19 May 2006

Volume: 1

Issue: 1

Electronic Location Identifier: e9

Author notes

Peter M. Rothwell is Professor of Clinical Neurology, Stroke Prevention Research Unit, University Department of Clinical Neurology, Radcliffe Infirmary, Oxford, United Kingdom. E-mail: peter.rothwell@ 123456clneuro.ox.ac.uk

Article

Publisher ID: 06-PLCT-ES-00022R1 Serial Item and Contribution ID: plct-01-01-09

DOI: 10.1371/journal.pctr.0010009

PMC ID: 1488890

PubMed ID: 16871331

SO-VID: 696801df-4ca8-4bde-a845-5b528cf5048c

Copyright © Copyright: © 2006 Peter M. Rothwell. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Page count

Pages: 5

Custom metadata

citation Rothwell PM (2006) Factors that can affect the external validity of randomised controlled trials. PLoS Clin Trials 1(1): e9. DOI: 10.1371/journal.pctr.0010009

trial-phase Not Applicable

ScienceOpen disciplines: Medicine

Data availability:

ScienceOpen disciplines: Medicine

Comments

Comment on this article

scite_

Cited by 128

See all cited by

Most referenced authors 491

See all reference authors

- Version 1

Factors That Can Affect the External Validity of Randomised Controlled Trials

Read this article at

Abstract

Related collections

Karger: Digital Health

Most cited references 17

Endarterectomy for asymptomatic carotid artery stenosis. Executive Committee for the Asymptomatic Carotid Atherosclerosis Study.

Variation in protection by BCG: implications of and for heterologous immunity.

Unpublished rating scales: a major source of bias in randomised controlled trials of treatments for schizophrenia.

Author and article information

Journal

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 15

Cited by 128

Most referenced authors 491