Está en la página 1de 10

LOGISTIC REGRESSION Introduction

Introduction

In some regression situations, the response variable is binary, indicating whether a


particular attribute or condition is present or absent rather than providing a measure
of the magnitude of the response variable at different values of the predictor variables.
In this case, we might wish to predict the proportion (πi ) of individuals possessing
a certain attribute at a given level of the predictor variable (xi ), but a simple linear
regression
πi = β0 + β1 xi
is not appropriate for several reasons.

First, the value πi is bounded between zero and one whilst β0 + β1 xi is unbounded,
secondly a graph of πi against xi is usually S–shaped rather than linear and thirdly,
if the number of individuals possessing the attribute is modelled to be binomial, the
variances of the proportions are not constant for the different values of the predictor.

The model can often be linearized by using the logistic or logit transformation. Instead
of fitting the straight line πi = β0 + β1 xi the proportions are transformed to values πi′
where  π 
′ i
πi = ln
1 − πi
and we fit the straight line
πi′ = β0 + β1 xi .

Once the linear model has been fitted, predictions for the original proportions can be
obtained using the equation

exp(β̂0 + β̂1 xi )
π̂i = .
1 + exp(β̂0 + β̂1 xi )

Although the logit transformation linearizes the model, the variances are still not equal
and for any individual whose response is coded as 0 or 1, the logit of the response
is either minus infinity or plus infinity and so the model is fitted using an iterative
maximum likelihood method. Due to the computational complexity of this method, it
is only available in larger statistical packages.

8.1
LOGISTIC REGRESSION Example

Interpretation of Regression Coefficient

In logistic regression, the coefficient β1 also has a interpretation different from that
in simple linear regression. The ratio πi /(1 − πi ) is called the odds in favour of the
attribute or condition being present. The difference between the logarithm of the odds
at a value x of the predictor and a value x + 1 of the predictor is

ln[odds at (x + 1)] − ln[odds at x] = β0 + β1 (x + 1) − (β0 + β1 x)


= β1

or taking exponentials on both sides

odds at x + 1
= e β1 .
odds at x

The estimate of eβ1 is called the odds ratio and is the increase in the odds for a unit
increase in the predictor variable.

Example

Samples of fifty insects are exposed to different concentrations of an insecticide and the
number of insects who have died recorded after 24 hours with the results shown below.

Concentration Sample Size Number Dead

0.0 50 2
2.2 50 4
5.7 50 5
8.4 50 10
16.0 50 36
20.5 50 44

8.2
LOGISTIC REGRESSION Minitab Analysis

Fitting the Logistic Model

With the above observations being stored in columns named Concentration, Sample
and Killed in a Minitab worksheet, Minitab is used to fit the logistic model using the
Regression menu.

Stat > Regression > Binary Logistic Regression


• Success [Killed] Trial [Sample]
Model [Concentration]

Logistic Regression Table


Odds 95% CI
Predictor Coef StDev Z P Ratio Lower Upper
Constant -3.4443 0.3546 -9.71 0.000
Concentr 0.26652 0.02747 9.70 0.000 1.31 1.24 1.38

Log-Likelihood = -112.346
Test that all slopes are zero: G = 158.588, DF = 1, P-Value = 0.000

Goodness-of-Fit Tests

Method Chi-Square DF P
Pearson 1.539 4 0.820
Deviance 1.484 4 0.829
Hosmer-Lemeshow 1.539 4 0.820

Table of Observed and Expected Frequencies:


(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

Group
Value 1 2 3 4 5 6 Total
Success
Obs 2 4 5 10 36 44 101
Exp 1.5 2.7 6.4 11.5 34.7 44.1
Failure
Obs 48 46 45 40 14 6 199
Exp 48.5 47.3 43.6 38.5 15.3 5.9

Total 50 50 50 50 50 50 300

8.3
LOGISTIC REGRESSION Conclusions

Conclusions

The fitted regression line is


πi′ = −3.444 + 0.267xi
with both coefficients being non–zero.

The odds ratio is 1.31. That is, for every unit increase in concentration, the odds of the
insect being killed is multiplied by 1.31, or with every unit increase in concentration,
the odds of killing the insect increase by 31%.

A measure of the fit of the model is provided by the deviance which has a chi–squared
distribution with degrees of freedom equal to the number of sample points minus the
number of parameters in the model. In this case, the deviance is small (1.484, χ24 ) and
so the model fits the observations.

If we want to estimate the proportion killed at a concentration of 12.0, from the



regression line we obtain πx=12 = −3.444 + 0.267 × 12 or π ′ = −0.246 and so

e−0.246
π̂ =
1 + e−0.246
= 0.44

An approximate estimate of the LD50 concentration, the concentration which will kill
50% of the population can be obtained by solving

0 = −3.444 + 0.267x

as the logit of π = 0.5 is 0. This gives an estimate of the LD50 concentration as 12.9.

8.4
LOGISTIC REGRESSION Discussion and Other Link Functions

The logistic regression which uses the transformation


 π 
i
πi′ = ln
1 − πi

assumes that no πi is either zero or one. If some πi are zero or one or take values very
close to zero or one the usual modification in the transformation is to take
1
πi = if πi = 0
2ni

1
πi = 1 − if πi = 1.
2ni

In the case of logistic regression the response function is

exp(β0 + β1 x)
π̂ = .
1 + exp(β0 + β1 x)

Other transformations which produce response functions with similar shapes are the
probit transformation which transforms the πi using the cumulative normal distri-
bution where  
′ −1
πi = Φ πi

and the complementary log–log transformation which transforms πi using

πi′ = ln[− ln(1 − πi )].

The probit transformation was introduced by Bliss and is widely used in estimating
LD50 dosages. Because of the computational complexity, before the availability of
computer packages tables to aid computation were prepared and other methods of
calculating LD50 dosages were proposed. These alternate methods to probit analysis
still provide quick and easy estimates of LD50 dosages.

The complementary log–log transformation transforms the probabilities which lie in


the range 0 to 1 to values in the range −∞ to ∞ and unlike the logit and probit
trasnformations does not produce a response function which is symmetric about π = 0.5.

8.5
ESTIMATION OF LD50 DOSES Method of Extreme Lethal Dosages

Assumptions

• Single subject at each dosage.

• Interval between successive dosages constant.

• Dosages cover range from certain death to certain survival.

Estimate

The estimate of log(LD50) is the mean of highest dosage which fails to kill and lowest
dosage which does kill.

Example

Concentration log(concentration) Result

λ x = log(λ)

200 2.3 kill


125 2.1 kill
80 1.9 survive
50 1.7 survive
30 1.5 kill
20 1.3 survive
12 1.1 survive

d 1.9 + 1.5
log(LD50) =
2
= 1.7
d = 50
LD50

8.6
ESTIMATION OF LD50 DOSES Behran’s Method

Assumptions

• Equal number of subjects at each dosage.

• Interval between successive dosages constant.

• Dosages cover range from 0% to 100% killed.

Estimate

The estimate of log(LD50) is the value of x for which Sx− (r) = Sx+ (n − r) where

Sx− (r) = total number killed at dosages ≤ x


Sx+ (n − r) = total number surviving at dosages ≥ x

Example

Concentration log(concentration) Number killed in Sx− Sx+


groups of 100
λ x = log(λ)

13.18 1.12 100 444 0


10.01 1.00 88 344 12
7.59 0.88 75 256 37
5.75 0.76 61 181 76
4.37 0.64 51 120 125
3.31 0.52 35 69 190
2.51 0.40 22 34 268
1.91 0.28 12 12 356
1.44 0.16 0 0 456

d 6
log(LD50) = 0.64 + 0.01
11
= 0.645
d = 4.42
LD50

[Note: The above estimate assumes a linear model between dosgaes.]

8.7
ESTIMATION OF LD50 DOSES Kärber’s Method

Assumptions

• Approximately equal number of subjects at each dosage.

• Dosages cover range from 0% to 100% killed.

Estimate

The estimate of log(LD50) is given by

d 1X
log(LD50) = (xi + xi+1 )(pi+1 − pi )
2

Example

xi +xi+1
Concentration log(concentration) Number killed in 2 pi+1 − pi
groups of 100
λ x = log(λ)

13.18 1.12 100


1.06 0.12
10.01 1.00 88
0.94 0.13
7.59 0.88 75
0.82 0.14
5.75 0.76 61
0.70 0.10
4.37 0.64 51
0.58 0.16
3.31 0.52 35
0.46 0.13
2.51 0.40 22
0.34 0.10
1.91 0.28 12
0.22 0.12
1.44 0.16 0

d
log(LD50) = 0.647
d = 4.44
LD50

8.8
PROBIT ANALYSIS Minitab Analysis

Using Minitab with the probit link function on the numerical example for Behran’s
method gives the following output

Link Function: Normit

Logistic Regression Table

Predictor Coef SE Coef Z P


Constant -2.35434 0.139123 -16.92 0.000
x 3.63719 0.200682 18.12 0.000

Log-Likelihood = -394.154
Test that all slopes are zero: G = 459.197, DF = 1, P-Value = 0.000

Goodness-of-Fit Tests

Method Chi-Square DF P
Pearson 14.0430 7 0.050
Deviance 21.8610 7 0.003
Hosmer-Lemeshow 14.0430 7 0.050

Table of Observed and Expected Frequencies:


(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

Group
Value 1 2 3 4 5 6 7 8 9 Total
Event
Obs 0 12 22 35 51 61 75 88 100 444
Exp 3.8 9.1 18.4 32.2 48.9 65.9 80.1 90.0 95.7
Non-event
Obs 100 88 78 65 49 39 25 12 0 456
Exp 96.2 90.9 81.6 67.8 51.1 34.1 19.9 10.0 4.3
Total 100 100 100 100 100 100 100 100 100 900

and the estimate of the LD50 dosage as 0.64279.

8.9
PROBIT ANALYSIS References

Fechner, G.T. (1860)


Elemente der Psychophysik
Leipzig, Breitkopt and Härtel

Bliss, C.I. (1934)


The method of probits
Science 79 38-39

Bliss, C.I. (1934)


The method of probits continued
Science 79 409-410

Bliss, C.I. (1935)


The calculation of the dosage–mortality curve
Annals of Applied Biology 22 134-167

Bliss, C.I. (1935)


The comparison of dosage–mortality data
Annals of Applied Biology 22 307-333

Finney, D.J. (1971)


Probit Analysis
Cambridge, Cambridge University Press

8.10

También podría gustarte