Está en la página 1de 29

Validation of predictive regression models

Ewout W. Steyerberg, PhD


Clinical epidemiologist

Frank E. Harrell, PhD


Biostatistician

Personal background
Ewout Steyerberg:
Erasmus MC, Rotterdam, the Netherlands

Frank Harrell: Health Evaluation Sciences,


Univ of Virginia, Charlottesville, VA, USA

Validation of predictions from regression models is of paramount importance

Learning objectives: knowledge of


common types of regression models fundamental assumptions of regression models performance criteria of predictive models principles of different types of validation

Performance objectives
To be able to explain why validation is necessary for predictive models To be able to judge the adequacy of a validation procedure

Predictive models provide quantitative estimates of an outcome, e.g.


Quality of life one year after surgery

Death at 30 days after surgery


Long term survival

Predictive models are often based on regression analysis


y ~ a + sum(bi*xi)
y: outcome variable

a: intercept
bi: regression coefficient i

xi: predictor variable i


i in [1,many], usually 2 to 20

3 examples of regression
Quality of life one year after surgery:

continuous outcome, linear regression


Death at 30 days after surgery:

binary outcome, logistic regression


Long term survival:

time-to-outcome, Cox regression

Predictive models make assumptions


Distribution
Linearity of continuous variables

Additivity of effects

Example: a simple logistic regression model


30day mortality ~ a + b1*sex + b2*age
Assumptions:

Distribution of 30day mortality is binomial


Age has a linear effect

The effects of sex and age can be added

Assessing model assumptions


Examine model residuals

Perform specific tests


add nonlinear terms, e.g. age+age2

add interaction terms, e.g. sex*age

Model assumptions and predictions


Better predictions if assumptions are met
Some violation inherent in empirical data

Evaluate predictions in new data

Evaluation of predictions
Calibration average of predictions correct? low and high predictions correct? Discrimination distinguish low risk from high risk patients?

Example: predicted probabilities


Actual 30-day mortality
Area under ROC: 0.77 Calibration: OK

0.0
0.0 0.1 0.2 0.3 0.4 Predicted probability of 30-day mortality

0.1

0.2

0.3

0.4

3 types of validation
Apparent: performance on sample used to develop model Internal: performance on population underlying the sample External: performance on related but slightly different population

Apparent validity
Easy to calculate

Results in optimistic performance estimates

Apparent estimates optimistic since same data used for:


Definition of model structure: e.g. selection and coding of variables
Estimation of model parameters: e.g. regression coefficients Evaluation of model performance: e.g. calibration and discrimination

Internal validity
More difficult to calculate

Test model in new data, random from underlying population

Why internal validation?


Honest estimate of performance should be obtained, at least for a population similar to the development sample

Internal validated performance sets an upper limit to what may be expected in other settings (external validity)

External validity
Moderately easy to calculate when new data are available Test model in new data, different from development population

Why external validation?


Various factors may differ from development population, including different selection of patients

different definitions of variables


different diagnostic or therapeutic procedures

Internal validation techniques


Split-sample:

development / validation
Cross-validation:

alternating development / validation


extreme: n-1 develop / 1 validate (jack-knife) Bootstrap

Bootstrap is the preferred internal validation technique


bootstrap sample for model development: n patients drawn with replacement
original sample for validation: n patients difference: optimism efficiency: development and validation on n patients

Example: bootstrap results for logistic regression model


30-day mortality ~ a + b1*sex + b2*age
Apparent area under the ROC curve: 0.77

Mean area of 200 bootstrap samples:0.772


Mean area of 200 tests in original: 0.762

Optimism in apparent performance: 0.01


Optimism-corrected area: 0.76

External validation techniques


Temporal validation: same investigators, validate in recent years Spatial validation (other place): same investigators, cross-validate in centers Fully external: other investigators, other centers

Example: external validity of logistic regression model


30-day mortality ~ a + b1*sex + b2*age
Apparent area in 785 patients: 0.77

Tested in 20,318 other patients: 0.74


Tested by other investigators: ?

Example: external validation


Actual 30-day mortality
Area under ROC: 0.74 Calibration: reasonable

0.0
0.0 0.1 0.2 0.3 0.4 Predicted probability of 30-day mortality

0.1

0.2

0.3

0.4

Summary
Apparent validity gives an optimistic estimate of model performance Internal validity may be estimated by bootstrapping External validity should be determined in other populations

Key references
tutorial and book on multivariable models
(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001)

empirical evaluations of strategies


(Steyerberg 2000: Stat Med19: 1059-79)

internal validation (Steyerberg 2001:JCE 54: 774-81)

external validation
(Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)

Links
Interactive text book on predictive modeling
http://www.neri.org/symptom/mockup/Chapter_8/

Harrells Regression modeling strategies


http://hesweb1.med.virginia.edu/biostat/rms/

También podría gustarte