Está en la página 1de 6

The n e w e ng l a n d j o u r na l of m e dic i n e

spe ci a l r ep or t

Statistics in Medicine — Reporting of Subgroup


Analyses in Clinical Trials
Rui Wang, M.S., Stephen W. Lagakos, Ph.D., James H. Ware, Ph.D., David J. Hunter, M.B., B.S.,
and Jeffrey M. Drazen, M.D.

Medical research relies on clinical trials to as- coronary events with the use of pravastatin was
sess therapeutic benefits. Because of the effort examined in a diverse population of persons who
and cost involved in these studies, investigators had survived a myocardial infarction. In sub-
frequently use analyses of subgroups of study group analyses, the investigators further examined
participants to extract as much information as whether the efficacy of pravastatin relative to pla-
possible. Such analyses, which assess the heter- cebo in preventing coronary events varied accord-
ogeneity of treatment effects in subgroups of pa- ing to the patients’ baseline low-density lipopro-
tients, may provide useful information for the care tein (LDL) levels.
of patients and for future research. However, sub- Subgroup analyses are also undertaken to in-
group analyses also introduce analytic challeng- vestigate the consistency of the trial conclusions
es and can lead to overstated and misleading among different subpopulations defined by each
results.1‑7 This report outlines the challenges as- of multiple baseline characteristics of the patients.
sociated with conducting and reporting subgroup For example, Jackson et al.9 reported the outcomes
analyses, and it sets forth guidelines for their use of a study in which 36,282 postmenopausal
in the Journal. Although this report focuses on the women 50 to 79 years of age were randomly as-
reporting of clinical trials, many of the issues dis- signed to receive 1000 mg of elemental calcium
cussed also apply to observational studies. with 400 IU of vitamin D3 daily or placebo. Frac-
tures, the primary outcome, were ascertained over
sub gr oup analyse s an average follow-up period of 7.0 years; bone den-
and rel ated concep t s sity was a secondary outcome. Overall, no treat-
ment effect was found for the primary outcome;
Subgroup Analysis that is, the active treatment was not shown to pre-
By “subgroup analysis,” we mean any evaluation vent fractures. The effect of calcium plus vitamin
of treatment effects for a specific end point in sub- D supplementation relative to placebo on the risk
groups of patients defined by baseline character- of each of four fracture outcomes was further ana-
istics. The end point may be a measure of treat- lyzed for consistency in subgroups defined by 15
ment efficacy or safety. For a given end point, the characteristics of the participants.
treatment effect — a comparison between the
treatment groups — is typically measured by a Heterogeneity and Statistical Interactions
relative risk, odds ratio, or arithmetic difference. The heterogeneity of treatment effects across the
The research question usually posed is this: Do the levels of a baseline variable refers to the circum-
treatment effects vary among the levels of a base- stance in which the treatment effects vary across
line factor? the levels of the baseline characteristic. Heteroge-
A subgroup analysis is sometimes undertaken neity is sometimes further classified as being ei-
to assess treatment effects for a specific patient ther quantitative or qualitative. In the first case,
characteristic; this assessment is often listed as one treatment is always better than the other, but
a primary or secondary study objective. For exam- by various degrees, whereas in the second case,
ple, Sacks et al.8 conducted a placebo-controlled one treatment is better than the other for one sub-
trial in which the reduction in the incidence of group of patients and worse than the other for

n engl j med 357;21  www.nejm.org  november 22, 2007 2189


The New England Journal of Medicine
Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
The n e w e ng l a n d j o u r na l of m e dic i n e

another subgroup of patients. Such variation, also tation of such results. There are several methods
called “effect modification,” is typically expressed for addressing multiplicity that are based on the
in a statistical model as an interaction term or use of more stringent criteria for statistical sig-
terms between the treatment group and the base- nificance than the customary P<0.05.7,15 A less
line variable. The presence or absence of interac- formal approach for addressing multiplicity is to
tion is specific to the measure of the treatment note the number of nominally significant inter-
effect. action tests that would be expected to occur by
The appropriate statistical method for assess- chance alone. For example, after noting that 60
ing the heterogeneity of treatment effects among subgroup analyses were planned, Jackson et al.9
the levels of a baseline variable begins with a sta- pointed out that “Up to three statistically signifi-
tistical test for interaction.10-13 For example, Sacks cant interaction tests (P<0.05) would be expected
et al.8 showed the heterogeneity in pravastatin on the basis of chance alone,” and then they in-
efficacy by reporting a statistically significant corporated this consideration in their interpre-
(P = 0.03) result of testing for the interaction be- tation of the results.
tween the treatment and baseline LDL level when
the measure of the treatment effect was the rel- Prespecified Analysis versus Post hoc
ative risk. Many trials lack the power to detect het- Analysis
erogeneity in treatment effect; thus, the inability A prespecified subgroup analysis is one that is
to find significant interactions does not show that planned and documented before any examination
the treatment effect seen overall necessarily ap- of the data, preferably in the study protocol. This
plies to all subjects. A common mistake is to analysis includes specification of the end point,
claim heterogeneity on the basis of separate tests the baseline characteristic, and the statistical
of treatment effects within each of the levels of method used to test for an interaction. For exam-
the baseline variable.6,7,14 For example, testing the ple, the Heart Outcomes Prevention Evaluation 2
hypothesis that there is no treatment effect in investigators16 conducted a study involving 5522
women and then testing it separately in men does patients with vascular disease or diabetes to as-
not address the question of whether treatment dif- sess the effect of homocysteine lowering with fo-
ferences vary according to sex. Another common lic acid and B vitamins on the risk of a major car-
error is to claim heterogeneity on the basis of the diovascular event. The primary outcome was a
observed treatment-effect sizes within each sub- composite of death from cardiovascular causes,
group, ignoring the uncertainty of these esti- myocardial infarction, and stroke. In the Methods
mates. section of their article, the authors noted that “Pre-
specified subgroup analyses involving Cox mod-
Multiplicity els were used to evaluate outcomes in patients
It is common practice to conduct a subgroup analy- from regions with folate fortification of food and
sis for each of several — and often many — base- regions without folate fortification, according to
line characteristics, for each of several end points, the baseline plasma homocysteine level and the
or for both. For example, the analysis by Jackson baseline serum creatinine level.” Post hoc analy-
and colleagues9 of the effect of calcium plus vi- ses refer to those in which the hypotheses being
tamin D supplementation relative to placebo on tested are not specified before any examination
the risk of each of four fracture outcomes for 15 of the data. Such analyses are of particular con-
participant characteristics resulted in a total of cern because it is often unclear how many were
60 subgroup analyses. undertaken and whether some were motivated by
When multiple subgroup analyses are per- inspection of the data. However, both prespeci-
formed, the probability of a false positive finding fied and post hoc subgroup analyses are subject
can be substantial.7 For example, if the null hy- to inflated false positive rates arising from mul-
pothesis is true for each of 10 independent tests tiple testing. Investigators should avoid the ten-
for interaction at the 0.05 significance level, the dency to prespecify many subgroup analyses in the
chance of at least one false positive result exceeds mistaken belief that these analyses are free of
40%. Thus, one must be cautious in the interpre- the multiplicity problem.

2190 n engl j med 357;21  www.nejm.org  november 22, 2007

The New England Journal of Medicine


Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Special Report

are shown in Figure 1. In general, we are unable


sub gr oup analyse s in the to determine the number of subgroup analyses
journal — a sse ssment of
rep or ting pr ac tice s conducted; we attempted to count the number of
subgroup analyses reported in the article and
As part of internal quality-control activities at the found that this number was unclear in nine ar-
Journal, we assessed the completeness and qual- ticles (15%). For example, Lees et al.17 reported
ity of subgroup analyses reported in the Journal that “We explored analyses of numerous other
during the period from July 1, 2005, through June subgroups to assess the effect of baseline prog-
30, 2006. A detailed description of the study meth- nostic factors or coexisting conditions on the
ods can be found in the Supplementary Appen-
dix, available with the full text of this article at Table 1. Characteristics and Predictors of Reporting Subgroup Analyses
www.nejm.org. In this report, we describe the in 97 Clinical Trials.*
clarity and completeness of subgroup-analysis re- Trials Reporting
porting, evaluate the authors’ interpretation and Subgroup
justification of the results of subgroup analyses, Variable Analyses P Value†
and recommend guidelines for reporting subgroup No. of Trials/ Univariate Multivariate
analyses. Total No. (%) Odds Ratio Odds Ratio
Among the original articles published in the No. of subjects 0.002† 0.02†
Journal during the period from July 1, 2005, ≤218 11/25 (44)
through June 30, 2006, a total of 95 articles re- 219–429 13/25 (52)
ported primary outcome results from randomized
430–1012 14/23 (61)
clinical trials. Among these 95 articles, 93 report-
>1012 21/24 (88)
ed results from one clinical trial; the remaining
2 articles reported results from two trials. Thus, Superiority trial 0.25 0.89
results from 97 trials were reported, from which Yes 53/84 (63)
subgroup analyses were reported for 59 trials No 6/13 (46)
(61%). Table 1 summarizes the characteristics of
Trial sites 0.005 0.05
the trials. We found that larger trials and multi-
Single-center 7/21 (33)
center trials were significantly more likely to re-
port subgroup analyses than smaller trials and Multicenter 52/76 (68)
single-center trials, respectively. With the use of Type of disease studied 0.18 0.37
multivariate logistic-regression models, when Cardiovascular 16/20 (80)
ranked according to the number of participants Infectious 2/7 (29)
enrolled in a trial and compared with trials with
Oncologic 9/11 (82)
the fewest participants, the odds ratio for report-
ing subgroup analyses for the second quartile was Respiratory 7/10 (70)
1.38 (95% confidence interval [CI], 0.45 to 4.20), Pediatric 5/10 (50)
for the third quartile was 1.98 (95% CI, 0.62 to Psychiatric or neurologic 6/10 (60)
6.24), and for the fourth quartile was 8.90 (95% Metabolic, endocrine, 5/10 (50)
CI, 2.10 to 37.78) (P = 0.02, trend test). The odds or gastrointestinal
ratio for reporting subgroup analyses in multi- Gynecologic 3/6 (50)
center trials as compared with single-center trials
Other 6/13 (46)
was 4.33 (95% CI, 1.56 to 12.16).
Among the 59 trials that reported subgroup Statistically significant primary 0.24 0.38
end point
analyses, these analyses were mentioned in the
Yes 35/62 (56)
Methods section for 21 trials (36%), in the Results
section for 57 trials (97%), and in the Discussion No 24/35 (69)
section for 37 trials (63%); subgroup analyses were
* A total of 59 trials reported subgroup analyses.
reported in both the text and a figure or table for † P values were determined with the use of trend tests.
39 trials (66%). Other characteristics of the reports

n engl j med 357;21  www.nejm.org  november 22, 2007 2191


The New England Journal of Medicine
Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
The n e w e ng l a n d j o u r na l of m e dic i n e

No. of Subgroup Analyses Clear about Prespecified or Post Hoc


25 45 40
20 40
20 35
17 17

Trials (no.)

Trials (no.)
30
15 25
10 20 16
15
5
5 10
5 3
0 0
1–4 5–8 >8 Unclear Never Sometimes Always

Interaction Test Reported Information Reported within Subgroups


35 32 25
21
30
20
25
Trials (no.)

Trials (no.)
20 15 13
16
15 10
11 10 8
10
5 3 4
5
0 0
Never Sometimes Always Inconsistent

at ry
P s

an ue
e
m g

P I

CI
C
ic
lu
Su hin
st ma

l
ist

va
va

d
ot
N
Consistent

Heterogeneity and Multiplicity Subgroup Analyses Reported in Abstract


45 44 45
40 40 37
35 35
Trials (no.)

Trials (no.)

30 30
25 25
20 20
13
15 15
7 9
10 10 6
5 2 5
0 0
Heterogeneity No Yes No Yes No Yes
Not Claimed
Multiplicity Issues Heterogeneity Heterogeneity
Addressed and Not Claimed Claimed
Heterogeneity Claimed

Figure 1. Reporting of Subgroup Analyses from 59 Clinical Trials.


The specific reporting characteristics examined in this quality-improvement exercise are indicated in each panel.
CI denotes confidence interval.
ICM
AUTHOR: Wang (Drazen) RETAKE 1st
FIGURE: 1 of 1 2nd
REG F
3rd
CASE Revised
treatment effect but found no EMail
evidence of nomi- Line they were 4-C reported to be used for some, but not
SIZE
nal significance for any biologically
Enon
likely
ARTIST:factor.”
ts all,
H/T subgroup
H/T analyses
33p9 in 11 trials (19%).
For four of these nine articles, we were able to de- Combo We assessed whether information was provided
AUTHOR, PLEASE NOTE:
termine that at least eight subgroup analyses
Figure has beenwere
redrawn about
and typetreatment
has been reset.effects within the levels of each
reported. In 40 trials (68%), it was unclear wheth-
Please checksubgroup
carefully. variable (Fig. 1). In 25 trials (42%), in-
er any of the subgroup analyses were prespecified formation about treatment effects was reported
JOB: 35721 ISSUE: 11-22-07
or post hoc, and in 3 others (5%) it was unclear consistently for all of the reported subgroup analy-
whether some were prespecified or post hoc. In- ses, and in 13 trials (22%), nothing was reported.
teraction tests were reported to have been used to Investigators in 15 trials (25%), all using supe-
assess the heterogeneity of treatment effects for riority designs,10 claimed heterogeneity of treat-
all subgroup analyses in only 16 trials (27%), and ment effects between at least one subject sub-

2192 n engl j med 357;21  www.nejm.org  november 22, 2007

The New England Journal of Medicine


Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Special Report

group and the overall study population (see Table 1


Guidelines for Reporting Subgroup Analysis.
of the Supplementary Appendix). For 4 of these
15 trials, this claim was based on a nominally sig- In the Abstract:
Present subgroup results in the Abstract only if the subgroup analyses were
nificant interaction test, and for 4 others it was based on a primary study outcome, if they were prespecified, and if they
based on within-subgroup comparisons only. In were interpreted in light of the totality of prespecified subgroup analyses
the remaining seven trials, significant results of undertaken.
interaction tests were reported for some but not In the Methods section:
Indicate the number of prespecified subgroup analyses that were performed
all subgroup analyses. When heterogeneity in the and the number of prespecified subgroup analyses that are reported.
treatment effect was reported, for two trials (13%), Distinguish a specific subgroup analysis of special interest, such as that
investigators offered caution about multiplicity, in the article by Sacks et al.,8 from the multiple subgroup analyses typical-
ly done to assess the consistency of a treatment effect among various pa-
and for four trials (27%), investigators noted the tient characteristics, such as those in the article by Jackson et al.9 For
heterogeneity in the Abstract section. each reported analysis, indicate the end point that was assessed and the
statistical method that was used to assess the heterogeneity of treatment
differences.
analysis of our findings Indicate the number of post hoc subgroup analyses that were performed and
and guideline s for rep or ting the number of post hoc subgroup analyses that are reported. For each re-
sub gr oup s ported analysis, indicate the end point that was assessed and the statisti-
cal method used to assess the heterogeneity of treatment differences.
Detailed descriptions may require a supplementary appendix.
In the 1-year period studied, the reporting of sub- Indicate the potential effect on type I errors (false positives) due to multiple
group analyses was neither uniform nor complete. subgroup analyses and how this effect is addressed. If formal adjust-
ments for multiplicity were used, describe them; if no formal adjustment
Because the design of future clinical trials can was made, indicate the magnitude of the problem informally, as done by
depend on the results of subgroup analyses, uni- Jackson et al.9
formity in reporting would strengthen the foun- In the Results section:
dation on which such research is built. Further- When possible, base analyses of the heterogeneity of treatment effects on
more, uniformity of reporting will be of value in tests for interaction, and present them along with effect estimates (in-
cluding confidence intervals) within each level of each baseline covariate
the interval between recognition of a potential analyzed. A forest plot21,22 is an effective method for presenting this in-
subgroup effect and the availability of adequate formation.
data on which to base clinical decisions. In the Discussion section:
Problems in the reporting of subgroup analy- Avoid overinterpretation of subgroup differences. Be properly cautious in ap-
ses are not new.1-6,18 Assmann et al.2 reported praising their credibility, acknowledge the limitations, and provide sup-
porting or contradictory data from other studies, if any.
shortcomings of subgroup analyses in a review of
the results of 50 trials published in 1997 in four
leading medical journals. More recently, Hernán- When properly planned, reported, and inter-
dez et al.4 reviewed the results of 63 cardiovascu- preted, subgroup analyses can provide valuable
lar trials published in 2002 and 2004 and noted information. With the availability of Web supple-
the same problems. To improve the quality of ments, the opportunity exists to present more de-
reports of parallel-group randomized trials, the tailed information about the results of a trial. The
Consolidated Standards of Reporting Trials state- purpose of the guidelines (see box) is to encour-
ment was proposed in the mid-1990s and revised age more clear and complete reporting of sub-
in 2001.19 Although there has been considerable group analyses. In some settings, a trial is con-
discussion of the potential problems associated ducted with a subgroup analysis as one of the
with subgroup analysis and recommendations on primary objectives. These guidelines are directly
when and how subgroup analyses should be con- applicable to the reporting of subgroup analyses
ducted and reported,19,20 our analysis of recent in the primary publication of a clinical trial when
articles shows that problems and ambiguities per- the subgroup analyses are not among the primary
sist in articles published in the Journal. For exam- objectives. In other settings, including observa-
ple, we found that in about two thirds of the pub- tional studies, we encourage complete and thor-
lished trials, it was unclear whether any of the ough reporting of the subgroup analyses in the
reported subgroup analyses were prespecified or spirit of the guidelines listed.
post hoc. In more than half of the trials, it was The editors and statistical consultants of the
unclear whether interaction tests were used, and Journal consider these guidelines to be important
in about one third of the trials, within-level results in the reporting of subgroup analyses. The goal
were not presented in a consistent way. is to provide transparency in the statistical meth-

n engl j med 357;21  www.nejm.org  november 22, 2007 2193


The New England Journal of Medicine
Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.
Special Report

ods used in order to increase the clarity and com- 9. Jackson RD, LaCroix AZ, Gass M, et al. Calcium plus vitamin
D supplementation and the risk of fractures. N Engl J Med 2006;
pleteness of the information reported. As always, 354:669-83. [Erratum, N Engl J Med 2006;354:1102.]
these are guidelines and not rules; additions and 10. Pocock SJ. Clinical trials: a practical approach. Chichester,
exemptions can be made as long as there is a clear England: John Wiley, 1983.
11. Halperin M, Ware JH, Byar DP, et al. Testing for interaction
case for such action. in an I×J×K contingency table. Biometrika 1977;64:271-5.
No potential conflict of interest relevant to this article was re- 12. Simon R. Patient subsets and variation in therapeutic effi-
ported. cacy. Br J Clin Pharmacol 1982;14:473-82.
We thank Doug Altman, John Bailar, Colin Begg, Mohan 13. Gail M, Simon R. Testing for qualitative interactions between
Beltangady, Marc Buyse, David DeMets, Stephen Evans, Thomas treatment effects and patient subsets. Biometrics 1985;41:361-72.
Fleming, David Harrington, Joe Heyse, David Hoaglin, Michael 14. Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA,
Hughes, John Ioannidis, Curtis Meinert, James Neaton, Robert Peters T. Subgroup analyses in randomized trials: risks of sub-
O’Neill, Ross Prentice, Stuart Pocock, Robert Temple, Janet group-specific analyses; power and sample size for the interac-
Wittes, and Marvin Zelen for their helpful comments. tion test. J Clin Epidemiol 2004;57:229-36.
15. Bailar JC III, Mosteller F, eds. Medical uses of statistics. 2nd
1. Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and in- ed. Waltham, MA: NEJM Books, 1992.
terpretation of treatment effects in subgroups of patients in ran- 16. Lonn E, Yusuf S, Arnold MJ, et al. Homocysteine lowering
domized clinical trials. JAMA 1991;266:93-8. with folic acid and B vitamins in vascular disease. N Engl J Med
2. Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup 2006;354:1567-77. [Erratum, N Engl J Med 2006;355:746.]
analysis and other (mis)uses of baseline data in clinical trials. 17. Lees KR, Zivin JA, Ashwood T, et al. NXY-059 for acute ische-
Lancet 2000;355:1064-9. mic stroke. N Engl J Med 2006;354:588-600.
3. Pocock SJ, Assmann SF, Enos LE, Kasten LE. Subgroup analy- 18. Al-Marzouki S, Roberts I, Marshall T, Evans S. The effect of
sis, covariate adjustment and baseline comparisons in clinical scientific misconduct on the results of clinical trials: a Delphi
trial reporting: current practice and problems. Stat Med 2002; survey. Contemp Clin Trials 2005;26:331-7.
21:2917-30. 19. Moher D, Schulz KF, Altman DG, et al. The CONSORT State-
4. Hernández A, Boersma E, Murray G, Habbema J, Steyerberg ment: revised recommendations for improving the quality of
E. Subgroup analyses in therapeutic cardiovascular clinical tri- reports of parallel-group randomized trials. (Accessed Novem-
als: are most of them misleading? Am Heart J 2006;151:257-64. ber 1, 2007, at http://www.consort-statement.org/.)
5. Parker AB, Naylor CD. Subgroups, treatment effects, and 20. International Conference on Harmonisation (ICH). Guid-
baseline risks: some lessons from major cardiovascular trials. Am ance for industry: E9 statistical principles for clinical trials. Rock-
Heart J 2000;139:952-61. ville, MD: Food and Drug Administration, September 1998. (Ac-
6. Rothwell PM. Subgroup analysis in randomised controlled cessed November 1, 2007, at http://www.fda.gov/cder/guidance/
trials: importance, indications, and interpretation. Lancet 2005; ICH_E9-fnl.PDF.)
365:176-86. 21. Cuzick J. Forest plots and the interpretation of subgroups.
7. Lagakos SW. The challenge of subgroup analyses — report- Lancet 2005;365:1308.
ing without distorting. N Engl J Med 2006;354:1667-9. [Erratum, 22. Wactawski-Wende J, Kotchen JM, Anderson GL, et al. Calci-
N Engl J Med 2006;355:533.] um plus vitamin D supplementation and the risk of colorectal
8. Sacks FM, Pfeffer MA, Moye LA, et al. The effect of prava- cancer. N Engl J Med 2006;354:684-96.
statin on coronary events after myocardial infarction in patients Copyright © 2007 Massachusetts Medical Society.
with average cholesterol levels. N Engl J Med 1996;335:1001-9.

full text of all journal articles on the world wide web


Access to the complete text of the Journal on the Internet is free to all subscribers. To use this Web site, subscribers should go
to the Journal’s home page (www.nejm.org) and register by entering their names and subscriber numbers as they appear on
their mailing labels. After this one-time registration, subscribers can use their passwords to log on for electronic access to the
entire Journal from any computer that is connected to the Internet. Features include a library of all issues since January 1993
and abstracts since January 1975, a full-text search capacity, and a personal archive for saving articles and search results of
interest. All articles can be printed in a format that is virtually identical to that of the typeset pages. Beginning 6 months after
publication, the full text of all Original Articles and Special Articles is available free to nonsubscribers who have completed a
brief registration.

2194 n engl j med 357;21  www.nejm.org  november 22, 2007

The New England Journal of Medicine


Downloaded from nejm.org on August 3, 2016. For personal use only. No other uses without permission.
Copyright © 2007 Massachusetts Medical Society. All rights reserved.

También podría gustarte