17 vistas

Cargado por daniel

The Cost of Dichotomising Continuous Variables

- Using Multivariable Linear Regression Technique
- GDP Economic Indicator
- Chapter 4
- ch11
- 1.Comp.project_A Statistical Analysis on the Influence of Materials & Manufacturing Condition on the Octane Rating
- MLR Introduction
- x-ray fluorescence technique
- 22_HW_Assignment_Biostat.docx
- AN ANALYSIS OF PRICE BEHAVIOUR OF RICE
- Timing of Blunt Force Injuries in Long Bones the Effects of the Environment, PMI Length and Human Surrogate Model.
- Chapter+8
- Comparacion AADEE_AVL Version Ingles New
- WOODSetal_1986_statistics.pdf
- Lecture 6
- 4. Eng-Modelling for Time-Dhaval M. Parikh
- Basic Statistics
- Method and Principel of Statistical Analysis
- Modelling the Correlations of E-waste Quantity With Economic Increase
- MScSyllabus
- Report Statistical Technique in Decision Making (GROUP BPT) - Correlation & Linear Regression123 (1)

Está en la página 1de 1

Statistics Notes

The cost of dichotomising continuous variables

Douglas G Altman, Patrick Royston

Cancer Research

UK/NHS Centre

for Statistics in

Medicine, Wolfson

College, Oxford

OX2 6UD

Douglas G Altman

professor of statistics

in medicine

MRC Clinical Trials

Unit, London

NW1 2DA

Patrick Royston

professor of statistics

Correspondence to:

Professor Altman

doug.altman@

cancer.org.uk

BMJ 2006;332:1080

1080

**Measurements of continuous variables are made in all
**

branches of medicine, aiding in the diagnosis and

treatment of patients. In clinical practice it is helpful to

label individuals as having or not having an attribute,

such as being “hypertensive” or “obese” or having

”high cholesterol,” depending on the value of a

continuous variable.

Categorisation of continuous variables is also common in clinical research, but here such simplicity is

gained at some cost. Though grouping may help data

presentation, notably in tables, categorisation is unnecessary for statistical analysis and it has some serious

drawbacks. Here we consider the impact of converting

continuous data to two groups (dichotomising), as this

is the most common approach in clinical research.1

What are the perceived advantages of forcing all

individuals into two groups? A common argument is

that it greatly simplifies the statistical analysis and leads

to easy interpretation and presentation of results. A

binary split—for example, at the median—leads to a

comparison of groups of individuals with high or low

values of the measurement, leading in the simplest case

to a t test or 2 test and an estimate of the difference

between the groups (with its confidence interval). There

is, however, no good reason in general to suppose that

there is an underlying dichotomy, and if one exists there

is no reason why it should be at the median.2

Dichotomising leads to several problems. Firstly,

much information is lost, so the statistical power to

detect a relation between the variable and patient outcome is reduced. Indeed, dichotomising a variable at

the median reduces power by the same amount as

would discarding a third of the data.2 3 Deliberately discarding data is surely inadvisable when research

studies already tend to be too small. Dichotomisation

may also increase the risk of a positive result being a

false positive.4 Secondly, one may seriously underestimate the extent of variation in outcome between

groups, such as the risk of some event, and

considerable variability may be subsumed within each

group. Individuals close to but on opposite sides of the

cutpoint are characterised as being very different

rather than very similar. Thirdly, using two groups

conceals any non-linearity in the relation between the

variable and outcome. Presumably, many who dichotomise are unaware of the implications.

If dichotomisation is used where should the

cutpoint be? For a few variables there are recognised

cutpoints, such as > 25 kg/m2 to define “overweight”

based on body mass index. For some variables, such as

age, it is usual to take a round number, usually a multiple of five or 10. The cutpoint used in previous studies may be adopted. In the absence of a prior cutpoint

the most common approach is to take the sample

median. However, using the sample median implies

that various cutpoints will be used in different studies

so that their results cannot easily be compared,

seriously hampering meta-analysis of observational

studies.5 Nevertheless, all these approaches are

**preferable to performing several analyses and
**

choosing that which gives the most convincing result.

Use of this so called “optimal” cutpoint (usually that

giving the minimum P value) runs a high risk of a spuriously significant result; the difference in the outcome

variable between the groups will be overestimated,

perhaps considerably; and the confidence interval will

be too narrow. This strategy should never be used.6 7

When regression is being used to adjust for the effect

of a confounding variable, dichotomisation will run the

risk that a substantial part of the confounding remains.4 7

Dichotomisation is not much used in epidemiological

studies, where the use of several categories is preferred.

Using multiple categories (to create an “ordinal”

variable) is generally preferable to dichotomising. With

four or five groups the loss of information can be quite

small, but there are complexities in analysis.

Instead of categorising continuous variables, we prefer to keep them continuous. We could then use

linear regression rather than a two sample t test, for

example. If we were concerned that a linear regression

would not truly represent the relation between the

outcome and predictor variable, we could explore

whether some transformation (such as a log transformation) would be helpful.7 8 As an example, in a regression

analysis to develop a prognostic model for patients with

primary biliary cirrhosis, a carefully developed model

with bilirubin as a continuous explanatory variable

explained 31% more of the variability in the data than

when bilirubin distribution was split at the median.7

Competing interests: None declared.

1

2

3

4

5

6

7

8

**Del Priore G, Zandieh P, Lee MJ. Treatment of continuous data as
**

categoric variables in obstetrics and gynecology. Obstet Gynecol

1997;89:351-4.

MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of

dichotomization of quantitative variables. Psychol Meth 2002;7:19-40.

Cohen J. The cost of dichotomization. Appl Psychol Meas 1983;7:249-53.

Austin PC, Brunner LJ. Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses.

Stat Med 2004;23:1159-78.

Buettner P, Garbe C, Guggenmoos-Holzmann I. Problems in defining

cutoff points of continuous prognostic factors: example of tumor

thickness in primary cutaneous melanoma. J Clin Epidemiol

1997;50:1201-10.

Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using

“optimal” cutpoints in the evaluation of prognostic factors. J Natl Cancer

Inst 1994;86:829-35.

Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous

predictors in multiple regression: a bad idea. Stat Med 2006;25:127-41.

Royston P, Sauerbrei W. Building multivariable regression models with

continuous covariates in clinical epidemiology–with an emphasis on

fractional polynomials. Methods Inf Med 2005;44:561-71.

Endpiece

Reading and reflecting

Reading without reflecting is like eating without

digesting.

Edmund Burke

Kamal Samanta, retired general practitioner,

Denby Dale, West Yorkshire

BMJ VOLUME 332

6 MAY 2006

bmj.com

- Using Multivariable Linear Regression TechniqueCargado porpemmasan
- GDP Economic IndicatorCargado pordipabali chowdhury
- Chapter 4Cargado porbhuvaneshkmrs
- ch11Cargado porChirag Patel
- 1.Comp.project_A Statistical Analysis on the Influence of Materials & Manufacturing Condition on the Octane RatingCargado porFahmid Tousif Khan
- MLR IntroductionCargado porAndrew Watson
- x-ray fluorescence techniqueCargado poraladinsane
- 22_HW_Assignment_Biostat.docxCargado porjhon
- AN ANALYSIS OF PRICE BEHAVIOUR OF RICECargado porTJPRC Publications
- Timing of Blunt Force Injuries in Long Bones the Effects of the Environment, PMI Length and Human Surrogate Model.Cargado porKrešimir Severin
- Chapter+8Cargado pornickcupolo
- Comparacion AADEE_AVL Version Ingles NewCargado porShereen Salah
- WOODSetal_1986_statistics.pdfCargado porMoniqueAmaral
- Lecture 6Cargado porkegnata
- 4. Eng-Modelling for Time-Dhaval M. ParikhCargado porImpact Journals
- Basic StatisticsCargado porNarendra Ashar
- Method and Principel of Statistical AnalysisCargado porPrashanta Pokhrel
- Modelling the Correlations of E-waste Quantity With Economic IncreaseCargado porGilma Vega
- MScSyllabusCargado porAtanu Mallik
- Report Statistical Technique in Decision Making (GROUP BPT) - Correlation & Linear Regression123 (1)Cargado porAlieffiac
- How to Interpret Regression Analysis Results: P-values & Coefficients?Cargado porStats Work Statswork
- 20110715123735-INFRASTRUCTURAL-MANAGEMENT-2011-2012Cargado porPuneet Sood
- Least Square RegressionCargado porSamiullah Qureshi
- RegressionCargado porRohit Arora
- Multiple Regression SPECIALISTICACargado poralchemist_bg

- EL MENSAJE DEL SAGRADO CORAZÓN DE JESÚS A SOR JOSEFA MENÉMDEZCargado porCésar Reyes
- Neural Fuzzy Systems LectureCargado pordaniel
- Modelo en Lógica Difusa de PercepciónCargado pordaniel
- Nomic Probability and the Foundations of InductionCargado pordaniel
- The Knowledge System Underpinning Healthcare is Not Fit for Purpose and Must ChangeCargado pordaniel
- Lógica DIfusa Salud Enfermedad y MalestarCargado pordaniel
- Automated Medical Diagnosis With Fuzzy Stochastc Models- Monitoring Chronic DisesesCargado pordaniel
- Lógica Difusa y ProbabilidadCargado pordaniel
- An Address on PrognosisCargado pordaniel
- Application of Bayesian Classifier for the Diagnosis of Dental PainCargado pordaniel
- 9241547073_engCargado porDeepak James
- Unexpected Predictor–Outcome Associations in Clinical Prediction Research-- Auses and SolutionsCargado pordaniel
- Comparing Risk Prediction ModelsCargado pordaniel
- Prognostic Modeling With Logistic Regression Analysis-- In Search of a Sensible Strategy in Small Data SetsCargado pordaniel
- Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) the TRIPOD StatementCargado pordaniel
- Patients' Reports or Clinicians' Assessments-- Wich Are Better for Prognosticating-Cargado pordaniel

- entropy-18-00324Cargado porGaro Ohanoglu
- Digital Image Processing Question BankCargado porSantosh Sanjeev
- M.PHIL RESEARCH METHODOLOGY MODEL QUESTION PAPERSCargado porDR K DHAMODHARAN
- Lecture5 Variance and Standard DeviationCargado porLoo Sha Ta
- The Information Role of ConservatismCargado porlupav
- sta2023 chapter3slidesCargado porapi-258903855
- Statistical Analysis of a sample of exam gradesCargado porIuliana Relea
- Michael S. Lewis-Beck-Data Analysis_ an Introduction, Issue 103-SAGE (1995)Cargado porArletPR
- The Effect of Leadership, Communication, And Discipline of the Population Administration Services in PalembangCargado porrobert0rojer
- Real Estate Statistics for O'Fallon, MO 63366 Including Real Estate & Housing StatisticsCargado porkevincottrell
- Testing Pikettys HypothesisCargado porWalacy Maciel'
- Formula MathCargado porKimmy Tan Siewkim
- Chapter 5 - Data Analysis and Research B. Interpreting, Analyzing,Cargado porAnonymous sewU7e6
- Aiha Journal = Industrial HygieneCargado porEderson Guimaraes
- PresentationCargado porAnkita Khaneja
- QuartilesCargado porJohn Martin Renomeron
- Problems on Averages and DispersionCargado porAditya Vighne
- STATPPTCargado pormgoldiieeee
- Data Template InstructionsCargado porrafaelsg
- StatisticsCargado porjupiteruk
- dixinaCargado porMuselli Sabino
- 2011 Borkowski Kowalak Combustion PressureCargado porMehmet Şimşek
- unit2testafmCargado porapi-375234253
- gemh103Cargado porrekshana
- Hospital Cost IndexCargado porHadi Rahmatsyah Sumardi
- Presentation Chapter 5 MTH1022 Rev#01Cargado porNorlianah Mohd Shah
- Stats 8 Midterm Study GuideCargado pornhandangg
- WirelessAcrossRoad RF Based Road Traffic Congestion Detection---IEEE2011Cargado porSVSEMBEDDED
- Solutions ErrataCargado porjosh
- 98911_13Cargado porkaran