Está en la página 1de 60

11-1

COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.
11-2

Chapter 11

Multiple Regression
11-3

11 Multiple Regression (1)


• Using Statistics
• The k-Variable Multiple Regression Model
• The F Test of a Multiple Regression Model
• How Good is the Regression
• Tests of the Significance of Individual Regression
Parameters
• Testing the Validity of the Regression Model
• Using the Multiple Regression Model for
Prediction
11-4

11 Multiple Regression (2)


• Qualitative Independent Variables
• Polynomial Regression
• Nonlinear Models and Transformations
• Multicollinearity
• Residual Autocorrelation and the Durbin-Watson
Test
• Partial F Tests and Variable Selection Methods
• Multiple Regression Using the Solver
• The Matrix Approach to Multiple Regression
Analysis
11-5

11 LEARNING OBJECTIVES (1)


After studying this chapter you should be able to:
• Determine whether multiple regression would be applicable
to a given instance
• Formulate a multiple regression model
• Carryout a multiple regression using a spreadsheet template
• Test the validity of a multiple regression by analyzing
residuals
• Carryout hypothesis tests about the regression coefficients
• Compute a prediction interval for the dependent variable
11-6

11 LEARNING OBJECTIVES (2)


After studying this chapter you should be able to:
• Use indicator variables in a multiple regression
• Carryout a polynomial regression
• Conduct a Durbin-Watson test for autocorrelation
in residuals
• Conduct a partial F-test
• Determine which independent variables are to be
included in a multiple regression model
• Solve multiple regression problems using the
Solver macro
11-7

11-1 Using Statistics

y Lines y Planes
B

B
A

Slope: 11 C
A
x1
Intercept: 00

x2
x
Any two points (A and B), or Any three points (A, B, and C), or an
an intercept and slope (0 and intercept and coefficients of x1 and x2
1), define a line on a two- (0 , 1, and 2), define a plane in a
dimensional surface. three-dimensional surface.
11-8

11-2 The k-Variable Multiple


Regression Model
Thepopulation
The populationregression
regressionmodelmodelof ofaa
dependentvariable,
variable,Y,
Y,on
onaaset
setof ofkk x2
dependent y
independentvariables,
independent variables,XX,1,XX,.2,.. .. ., ,XXkisis 2
1 2 k
givenby:
given by:
Y=0++X
1X1 + 2X2 + . . . + kXk +
Y= 0 1 1 + 2X2 + . . . + kXk +
1
where0isisthe
where theY-intercept
Y-interceptof ofthe
the 0
0
regressionsurface
regression surfaceand eachi, ,ii==1,2,...,k
andeach 1,2,...,k
i
isisthe
theslope
slopeof
ofthe
theregression
regressionsurface
surface-- x1
sometimescalled
sometimes calledthe theresponse
responsesurface
surface--
y   0   1 x1   2 x 2  
withrespect
with respecttotoXX.i.
i
Modelassumptions:
Model assumptions:
~N(0,2),
1.1. ~N(0,
2
),independent
independentofofother
othererrors.
errors.
2.2. The
Thevariables
variablesXXiare
i
areuncorrelated
uncorrelatedwith
withthe
theerror
errorterm.
term.
11-9

Simple and Multiple Least-Squares


Regression
Y y

x1
y  b0  b1x
X x2 y  b0  b1 x1  b2 x 2

InInaasimple
simpleregression
regressionmodel,
model,
model
model InInaamultiple
multipleregression
regressionmodel,
model,
model
model
theleast-squares
the least-squaresestimators
estimators theleast-squares
least-squaresestimators
estimators
the
minimizethe
minimize thesum
sumofofsquared
squared minimizethethesum
sumofofsquared
squared
minimize
errorsfrom
errors fromthe
theestimated
estimated errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
regression line. regressionplane.
plane.
regression
11-10

The Estimated Regression


Relationship
Theestimated
The estimatedregression
regressionrelationship:
relationship:
relationship
relationship
Y  b0  b1 X 1  b2 X 2 bk X k
whereY isisthe
where thepredicted
predictedvalue
valueofofY,
Y,the
thevalue
valuelying
lyingon
onthe
the
estimatedregression
estimated regressionsurface.
surface. The
Theterms
termsbbi,i,for
forii==0,
0,1,
1,....,k
....,kare
are
theleast-squares
the least-squaresestimates
estimatesof
ofthe
thepopulation
populationregression
regression
parametersi.i.
parameters
Theactual,
The actual,observed
observedvalue valueof ofYYisisthe
thepredicted
predictedvalue
valueplus
plusan
an
error:
error:
yyj j==bb00++bb11xx1j1j++bb22xx2j2j+.
+.....++bbkxxkj+e,
k kj
+e, jj==1,1,…,
…,n.n.
11-11

Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the
estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.

 y  nb  b  x  b  x
0 1 1 2 2

x y b x b x b x x
2

1 0 1 1 1 2 1 2

x y b x b x x b x
2

2 0 2 1 1 2 2 2
11-12

Example 11-1

YY XX1 1 XX2 2 XX1X1X2 XX121 XX222 XX1Y1Y XX2Y2Y NormalEquations:


Equations:
2 2

72 12
2 Normal
72 12 55 60
60 144
144 25
25 864
864 360
360
76
76 11
11 88 88
88 121
121 64
64 836
836 608
608
78
78 15
15 66 90
90 225
225 36
36 1170
1170 468
468 743==10b
743 10b+123b
0+123b+65b 1+65b2
70 10 0 1 2
70 10 55 50
50 100
100 25
25 700
700 350
350 9382==123b
123b+1615b
68 11 9382 0+1615b+869b1+869b2
68 11 33 33
33 121
121 99 748
748 204
204 0 1 2
80
80 16
16 99 144
144 256
256 81
81 1280
1280 720
720 5040==65b
5040 65b+869b
0+869b+509b1+509b2
82
82 14
14 12
12 168
168 196
196 144
144 1148
1148 984
984 0 1 2
65
65 88 44 32
32 64
64 16
16 520
520 260
260
62 88 33 24 64 99 496 186
62
90
90 18
18 10
10
24
180
180
64
324
324 100
100
496
1620
1620
186
900
900
bb00==47.164942
47.164942
---
---
743
---
---
123
---
---
65
---
---
869
----
----
1615
---
---
509
----
----
9382
----
----
5040
bb11==1.5990404
1.5990404
743 123 65 869 1615 509 9382 5040
bb22==1.1487479
1.1487479
Estimatedregression
Estimated regressionequation:
equation:

  47164942
YY 47164942
.. 15990404
15990404
.. XX11 11487479
11487479
.. XX22
11-13

Example 11-1: Using the Template

Regression results for Alka-Seltzer sales


11-14

Decomposition of the Total Deviation


in a Multiple Regression Model


y
 Y  Y: Error Deviation
Total deviation: Y  Y
Y  Y : Regression Deviation
y

x1

x2
TotalDeviation
Total Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
SST
SST == SSR
SSR ++ SSE
SSE
11-15

11-3 The F Test of a Multiple


Regression Model
AAstatistical
statisticaltest
testfor
forthe
theexistence
existenceof ofaalinear
linearrelationship
relationshipbetween
betweenYYand
andany
anyor
or
allof
all ofthe
theindependent
independentvariables
variablesXX,1,xx,2,...,
...,XX:k:
1 2 k
HH0:0: 11==22==...=
...== k= 0
k 0
HH1:1: Not
Notall thei(i=1,2,...,k)
allthe (i=1,2,...,k)areareequal
equaltoto00
i

Sourceofof Sum
Source Sumof
of Degreesofof
Degrees
Variation Squares
Variation Squares Freedom Mean
Freedom MeanSquare
Square FFRatio
Ratio

Regression SSR
Regression SSR kk SSR
MSR  F  MSR
k MSE
Error
Error SSE
SSE nn--(k+1)
(k+1) SSE
MSE 
( n  ( k  1))
Total
Total SST
SST n-1
n-1 SST
MST 
( n  1)
11-16

Using the Template: Analysis of


Variance Table (Example 11-1)

F Distribution with 2 and 7 Degrees of Freedom Thetest


The teststatistic,
statistic,FF==86.34,
86.34,isisgreater
greater
f(F)
thanthe
than thecritical
criticalpoint
pointof
ofFF(2, 7)for
forany
any
(2, 7)
Test statistic 86.34
commonlevel
common levelofofsignificance
significance
(p-value 0),so
(p-value0), sothe
thenull
nullhypothesis
hypothesisisis
rejected,and
rejected, andwewemight
mightconclude
concludethat that
=0.01
thedependent
the dependentvariable
variableisisrelated
relatedtoto
F
oneor
one ormore
moreof ofthe
theindependent
independent
0
F0.01=9.55 variables.
variables.
11-17

11-4 How Good is the Regression


y The mean square error is an unbiased
estimator of the variance of the population
2
errors,  , denoted by  :
SSE  ( y  y) 2
MSE  
( n  ( k  1)) ( n  ( k  1))
x1
Standard error of estimate:
x2 Errors: y - y s = MSE

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
2 SSR SSE
R = =1-
SST SST
11-18

Decomposition of the Sum of Squares and


the Adjusted Coefficient of Determination

SST

SSR SSE
2 SSR SSE
R = = 1-
SST SST

The adjusted multiple coefficien t of determinat ion, R 2, is the coefficien t of


determinat ion with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n -(k +1))
SST
(n -1)

Example11-1:
Example 11-1: ss==1.911
1.911 R-sq==96.1%
R-sq 96.1% R-sq(adj)==95.0%
R-sq(adj) 95.0%
11-19

Measures of Performance in Multiple


Regression and the ANOVA Table
Sourceof
Source of Sum
Sumof
of Degreesof
Degrees of
Variation Squares
Variation Squares Freedom Mean
Freedom MeanSquare
Square FFRatio
Ratio

Regression SSR
Regression SSR (k)
(k) MSR
SSR F 
MSR  MSE
k
Error
Error SSE
SSE (n-(k+1))
(n-(k+1)) SSE
=(n-k-1)
=(n-k-1) MSE 
( n  ( k  1))
Total
Total SST
SST (n-1)
(n-1) SST
MST 
( n  1)

SSE
SSR SSE 2
2 R ( n  ( k  1))
R = = 1- F  2 (n - (k + 1)) MSE
SST SST 2 R = 1- =
(1  R ) (k ) SST MST

(n - 1)
11-20

11-5 Tests of the Significance of


Individual Regression Parameters

Hypothesistests
Hypothesis testsabout
aboutindividual
individualregression
regressionslope
slope
parameters:
parameters:
(1)
(1) HH00::11==00
HH11::110
0
(2)
(2) HH00::22==00
HH11::220
0
...
..
.
(k)
(k) HH00::kk==00
H :  0 b
b 00
Test H 1:  k0for test i: t
statistic
Test statistic
1 k for test i: t 
( n  ( k 1 )
i
i

ss((bb))
( n  ( k 1 )
i
i
11-21

Regression Results for Individual


Parameters (Interpret the Table)
Coefficient Standard
Coefficient Standard
Variable
Variable Estimate
Estimate Error
Error t-Statistic
t-Statistic
Constant 53.12 5.43 9.783
Constant 53.12 5.43 9.783
**
XX11 2.03 0.22 9.227
2.03 0.22 9.227
**
XX22 5.60 1.30 4.308
5.60 1.30 4.308
**
XX33 10.35
10.35 6.88
6.88 1.504
1.504
XX44 3.45
3.45 2.70
2.70 1.259
1.259
XX55 -4.25 0.38 11.184
-4.25 0.38 11.184
**
n=150
n=150 t0.025=1.96
t0.025 =1.96
11-22

Example 11-1: Using the Template

Regression results for Alka-Seltzer sales


11-23

Using the Template: Example 11-2

Regression results for Exports to Singapore


11-24

11-6 Testing the Validity of the


Regression Model: Residual Plots

Residuals vs M1 (Example 11-2)

It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
11-25

11-6 Testing the Validity of the


Regression Model: Residual Plots

Residuals vs Price (Example 11-2)

It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
11-26

Normal Probability Plot for the


Residuals: Example 11-2
Linear trend indicates residuals are normally distributed
11-27

Investigating the Validity of the Regression:


Outliers and Influential Observations

Regression line Point with a large


y without outlier y value of xi

. . *
.
.. .. Regression
Regression line

. .. ..
when all data are
line with
. .. . included
.. . outlier
.. .. .. .
. . .. .
No relationship in
this cluster
* Outlier
x x
Outliers
Outliers InfluentialObservations
Influential Observations
11-28

Possible Relation in the Region between the


Available Cluster of Data and the Far Point

Point with a large value of xii x


y *
Some of the possible data between the x
original cluster and the far point x
x
x x

. . . . x x x
x
x x

.. .. .. . x x x x
x
. .. . x
x
x x x

More appropriate curvilinear relationship


(seen when the in between data are known).

x
11-29

Outliers and Influential Observations:


Example 11-2

UnusualObservations
Unusual Observations
Obs.
Obs. M1M1 EXPORTS
EXPORTS Fit
Fit Stdev.Fit
Stdev.Fit Residual
Residual St.Resid
St.Resid
11 5.10
5.10 2.6000
2.6000 2.6420
2.6420 0.1288
0.1288 -0.0420
-0.0420 -0.14XX
-0.14
22 4.90
4.90 2.6000
2.6000 2.6438
2.6438 0.1234
0.1234 -0.0438
-0.0438 -0.14XX
-0.14
25
25 6.20
6.20 5.5000
5.5000 4.5949
4.5949 0.0676
0.0676 0.9051
0.9051 2.80R
2.80R
26
26 6.30
6.30 3.7000
3.7000 4.6311
4.6311 0.0651
0.0651 -0.9311
-0.9311 -2.87R
-2.87R
50
50 8.30
8.30 4.3000
4.3000 5.1317
5.1317 0.0648
0.0648 -0.8317
-0.8317 -2.57R
-2.57R
67
67 8.20
8.20 5.6000
5.6000 4.9474
4.9474 0.0668
0.0668 0.6526
0.6526 2.02R
2.02R

RRdenotes
denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
XXdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
11-30

11-7 Using the Multiple Regression


Model for Prediction

Sales EstimatedRegression
Estimated RegressionPlane
Planefor
forExample
Example11-1
11-1

89.76

Advertising

18.00

63.42
8.00
Promotions 12 3
11-31

Prediction in Multiple Regression

AA(1 -))100%
(1- 100%prediction
predictioninterval intervalfor
foraavalue
valueof
ofYYgiven
givenvalues
valuesof
of XX::
ii
yˆyˆtt( ,(n(k 1))) ss22((yˆyˆ))MSE MSE
( 2,(n(k 1)))
2
AA(1
(1--αα))100%
100%prediction
predictioninterval intervalfor
forthe
theconditiona
conditionallmean
meanof
ofYYgiven
given
valuesof
values of XX::
ii
yˆyˆtt ss[[EˆEˆ(Y(Y)])]
(( 2,(,(nn((kk11)))
)))
2
11-32

11-8 Qualitative (or Categorical)


Independent Variables (in Regression)
An indicator (dummy, binary) variable of qualitative level A:
1 if level A is obtained
Xh  
0 if level A is not obtained
MOVIEEARN COST PROM BOOK
MOVIEEARN COST PROM BOOK
1 28 4.2 1.0 0
1 28 4.2 1.0 0
2 35 6.0 3.0 1
2 35 6.0 3.0 1
3 50 5.5 6.0 1
3 50 5.5 6.0 1
4 20 3.3 1.0 0
4 20 3.3 1.0 0
5 75 12.5 11.0 1
5 75 12.5 11.0 1
6 60 9.6 8.0 1
6 60 9.6 8.0 1
7 15 2.5 0.5 0
7 15 2.5 0.5 0
8 45 10.8 5.0 0
8 45 10.8 5.0 0
9 50 8.4 3.0 1
9 50 8.4 3.0 1
10 34 6.6 2.0 0
10 34 6.6 2.0 0
11 48 10.7 1.0 1
11 48 10.7 1.0 1
12 82 11.0 15.0 1
12 82 11.0 15.0 1
13 24 3.5 4.0 0
13 24 3.5 4.0 0
14 50 6.9 10.0 0
14 50 6.9 10.0 0
15 58 7.8 9.0 1
15 58 7.8 9.0 1
16 63 10.1 10.0 0
16 63 10.1 10.0 0
17 30 5.0 1.0 1
17 30 5.0 1.0 1
18 37 7.5 5.0 0
18 37 7.5 5.0 0
19 45 6.4 8.0 1
19 45 6.4 8.0 1
20
20 72
72
10.0
10.0
12.0
12.0 1
1
EXAMPLE 11-3
11-33

Picturing Qualitative Variables in


Regression
Y y
Line for X2=1
b3

b0+b2
Line for X2=0

b0
x1

X1 x2
AAregression
regressionwith
withone
one AAmultiple
multipleregression
regressionwith
withtwo
two
quantitativevariable
quantitative variable(X
(X)1)and
and quantitativevariables
quantitative variables(X
(X1and
andXX)2)
1 1 2
onequalitative
one qualitativevariable
variable(X(X):2): andone
and onequalitative
qualitativevariable
variable(X
(X):3):
2 3

y  b  b x  b x
0 1 1 2 2
y  b  b x  b x  b x
0 1 1 2 2 3 3
11-34

Picturing Qualitative Variables in Regression:


Three Categories and Two Dummy Variables

Y
Line for X = 0 and X3 = 1 AAqualitative
qualitative
variablewith
variable withrr
levelsor
levels orcategories
categories
Line for X2 = 1 and X3 = 0 isisrepresented
representedwith
with
(r-1)0/1
(r-1) 0/1(dummy)
(dummy)
b0+b3 variables.
variables.
Line for X2 = 0 and X3 = 0

b0+b2

b0
X1
Category XX2
Category XX33
2
AAregression
regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
1
andtwo
two Adventure 00
Adventure 00
qualitativevariables
qualitative variables(X
(X2and
2
andXX):
2
2): Drama
Drama 00 11
Romance 11 00
y  b  b x  b x  b x
0 1 1 2 2 3 3
Romance
11-35

Using Qualitative Variables in


Regression: Example 11-4

Salary==8547
Salary 8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience -- 3256
3256Gender
Gender
(SE) (32.6)
(SE) (32.6) (45.1)
(45.1) (78.5)
(78.5) (212.4)
(212.4)
(t) (262.2)
(t) (262.2) (21.0)
(21.0) (16.0)
(16.0) (-15.3)
(-15.3)

1 if Female Onaverage,
On average,female
femalesalaries
salariesare
are
Gender  
0 if Male $3256below
$3256 belowmale
malesalaries
salaries
11-36

Interactions between Quantitative and


Qualitative Variables: Shifting Slopes
Line for X2=0
Y

Slope = b1 Line for X2=1

b0

Slope = b1+b3

b0+b2
X1

AAregression
regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
variable(X
variable (X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
1 2

y  b  b x  b x  b x x
0 1 1 2 2 3 1 2
11-37

11-9 Polynomial Regression

One-variable polynomial regression model:


Y= 0+1 X + 2X2 + 3X3 +. . . + mXm +
where m is the degree of the polynomial - the highest power of X appearing in
the equation. The degree of the polynomial is the order of the model.
Y Y
y  b  b X
y  b  b X
0 1
0 1

y  b  b X  b X
0 1 2
2

(b  0) y  b  b X  b X  b X
0 1 2
2
3
3

X1 X1
11-38

Polynomial Regression: Example 11-5


11-39

Polynomial Regression: Other


Variables and Cross-Product Terms

Variable Estimate
Variable Estimate Standard
StandardError
Error T-statistic
T-statistic
XX1 1 2.34
2.34 0.92
0.92 2.54
2.54
XX2 2 3.11
3.11 1.05
1.05 2.96
2.96
XX121 4.22 1.00 4.22
2
4.22 1.00 4.22
XX222 3.57 2.12 1.68
2
3.57 2.12 1.68
XX1X1X 2.77 2.30 1.20
2
2
2.77 2.30 1.20
11-40

11-10 Nonlinear Models and


Transformations
The multiplicative
The multiplicativemodel model::
Y   X X X 
Y   X X X
  1 2 3
  1  2 3
0 1 2 3
0 1 2 3

The logarithmic
The logarithmic transformation
transformation::
log log  log
logYY log X  log
logX 0
X  log
logX logX 1
log
X log 1 2 2 3 3
0 1 1 2 2 3 3
11-41

Transformations: Exponential Model


The exponential
The exponential model
model::
Y   e 
Y   e  1X
1 X
0
0

The logarithmic
The logarithmic transformation
transformation::
logY
log Y  log  
log XX 
0
0
1
log
 log
1
1
1
11-42

Plots of Transformed Variables


Sim ple Regression of Sales on Ad vertising Regression of Sales on Log(Advertising)

30 25

20
SALES

SALES
15

Y = 6 .59 2 71 + 1.19 176 X Y = 3.6 6 8 2 5 + 6 .78 4 X


10 R- Sq uared = 0 .8 9 5 R- Sq uared = 0 .978
5

0 5 10 15 0 1 2 3
ADVERT LOGADV

Regression of Log(Sales) on Log(Advertising) Residual Plots: Sales vs Log(Advertising)


1.5
3.5

RESIDS 0.5
LOGSALE

2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X -0.5

R- Sq uared = 0 .9 47
-1.5
1.5
2 12 22
0 1 2 3
LOGADV Y-HAT
11-43

Variance Stabilizing Transformations

••Square
Square root
root transformation:
transformation: Y   Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
theconditional
conditionalmean
meanof
ofYY
••Logarithmic
Logarithmic transformation:
transformation: Y   log(Y )
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisisapproximately
approximately
proportionalto
proportional tothe
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY
••Reciprocal
Reciprocal transformation:
transformation: Y   1
Y
Useful
 Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
conditionalmean
conditional meanofofYY
11-44

Regression with Dependent Indicator


Variables
The logistic function:
e (  X )
0 1

E (Y X ) 
1  e (  X )
0 1

Transformation to linearize the logistic function:


 p 
p   log 
 1  p

y Logistic Function
1

0
x
11-45

11-11: Multicollinearity
x2

x1 x2 x1
Orthogonal X variables provide Perfectly collinear X variables
information from independent provide identical information
sources. No multicollinearity. content. No regression.

x2
x2
x1 x1
Some degree of collinearity.
A high degree of negative
Problems with regression depend
collinearity also causes problems
on the degree of collinearity.
with regression.
11-46

Effects of Multicollinearity

•• Variancesof
Variances ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
•• Magnitudesof
Magnitudes ofregression
regressioncoefficients
coefficientsmay
maybebedifferent
different
fromwhat
from whatare
areexpected.
expected.
•• Signsof
Signs ofregression
regressioncoefficients
coefficientsmay
maynotnotbe
beas
asexpected.
expected.
•• Addingor
Adding orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin in
coefficients.
coefficients.
•• Removingaadata
Removing datapoint
pointmay
maycause
causelarge
largechanges
changesin in
coefficientestimates
coefficient estimatesor
orsigns.
signs.
•• Insome
In somecases,
cases,the
theFFratio
ratiomay
maybebesignificant
significantwhile
whilethe
thett
ratiosare
ratios arenot.
not.
11-47
Detecting the Existence of Multicollinearity:
Correlation Matrix of Independent Variables and
Variance Inflation Factors
11-48

Variance Inflation Factor

Thevariance
The varianceinflation
inflationfactor
factorassociated
associatedwithwithXXhh::
11
VIF((XX h))
VIF h 1  R
1  Rhh
2
2

2 2
whereRR2hh isisthe
where theRR2 value
valueobtained
obtainedfor forthe
theregression
regressionof
ofXXon
on
theother
the otherindependent
independentvariables.
variables.

Relationship between VIF and R h2


VIF100

50

0
0.0 0.5 1.0 Rh2
11-49

Variance Inflation Factor (VIF)

Observation: The VIF (Variance Inflation Factor)


values for both variables Lend and Price are both
greater than 5. This would indicate that some degree of
multicollinearity exists with respect to these two
variables.
11-50

Solutions to the Multicollinearity


Problem

•• Drop
Drop aa collinear
collinear variable
variable from
from the
the
regression
regression
•• Change
Change inin sampling
sampling plan
plan to
to include
include
elements outside
elements outside the
the multicollinearity
multicollinearity range
range
•• Transformations
Transformations of of variables
variables
•• Ridge
Ridge regression
regression
11-51

11-12 Residual Autocorrelation and


the Durbin-Watson Test
Anautocorrelation
An autocorrelationisisaacorrelation
correlationof
ofthe
thevalues
valuesofofaavariable
variable
withvalues
with valuesof
ofthe
thesame
samevariable
variablelagged
laggedone
oneorormore
moreperiods
periods
back. Consequences
back. Consequencesof ofautocorrelation
autocorrelationinclude
includeinaccurate
inaccurate
estimatesof
estimates ofvariances
variancesand
andinaccurate
inaccuratepredictions.
predictions.
LaggedResiduals
Lagged Residuals TheDurbin-Watson
Durbin-Watsontest test(first-order
(first-order
The
ii i i i-1i-1 i-2i-2 i-3i-3 i-4i-4
autocorrelation):
autocorrelation):
11 1.01.0 ** ** ** ** HH0:0:11==00
22 0.00.0 1.01.0 ** ** ** 0
33 -1.0
-1.0 0.00.0 1.01.0 ** ** HH1:
1:
0
44 2.02.0 -1.0
-1.0 0.00.0 1.01.0 ** TheDurbin-Watson
The Durbin-Watsontest teststatistic:
statistic:
55 3.03.0 2.02.0 -1.0
-1.0 0.00.0 1.01.0
n
66 -2.0 3.0 2.0 -1.0 -1.0 0.0 2
-2.0 3.0 2.0 0.0  ( ei  ei 1 )
77 1.0 -2.0 3.0 2.0 -1.0
d  i2 n
1.0 -2.0 3.0 2.0 -1.0
88 1.51.5 1.01.0 -2.0
-2.0 3.03.0 2.02.0
99 1.0 1.5 1.0 -2.0 -2.0 3.0 2
10
1.0
-2.5
1.5
1.0
1.0
1.5 1.0
3.0
-2.0
 ei
10 -2.5 1.0 1.5 1.0 -2.0 i 1
11-52

Critical Points of the Durbin-Watson Statistic: =0.05,


n= Sample Size, k = Number of Independent Variables

kk==11 kk==22 kk==33 kk==44 kk==55


nn ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU ddLL ddUU
15
15 1.08
1.08 1.36
1.36 0.95
0.95 1.54
1.54 0.82
0.82 1.75
1.75 0.69
0.69 1.97
1.97 0.56
0.56 2.21
2.21
16
16 1.10
1.10 1.37
1.37 0.98
0.98 1.54
1.54 0.86
0.86 1.73
1.73 0.74
0.74 1.93
1.93 0.62
0.62 2.15
2.15
17
17 1.13
1.13 1.38
1.38 1.02
1.02 1.54
1.54 0.90
0.90 1.71
1.71 0.78
0.78 1.90
1.90 0.67
0.67 2.10
2.10
18
18 1.16
1.16 1.39
1.39 1.05
1.05 1.53
1.53 0.93
0.93 1.69
1.69 0.82
0.82 1.87
1.87 0.71
0.71 2.06
2.06
. .. . .. . .. . .. . .. . ..
.. .. .. .. .. ..
. . . . . .
65
65 1.57
1.57 1.63
1.63 1.54
1.54 1.66
1.66 1.50
1.50 1.70
1.70 1.47
1.47 1.73
1.73 1.44
1.44 1.77
1.77
70
70 1.58
1.58 1.64
1.64 1.55
1.55 1.67
1.67 1.52
1.52 1.70
1.70 1.49
1.49 1.74
1.74 1.46
1.46 1.77
1.77
75
75 1.60
1.60 1.65
1.65 1.57
1.57 1.68
1.68 1.54
1.54 1.71
1.71 1.51
1.51 1.74
1.74 1.49
1.49 1.77
1.77
80
80 1.61
1.61 1.66
1.66 1.59
1.59 1.69
1.69 1.56
1.56 1.72
1.72 1.53
1.53 1.74
1.74 1.51
1.51 1.77
1.77
85
85 1.62
1.62 1.67
1.67 1.60
1.60 1.70
1.70 1.57
1.57 1.72
1.72 1.55
1.55 1.75
1.75 1.52
1.52 1.77
1.77
90
90 1.63
1.63 1.68
1.68 1.61
1.61 1.70
1.70 1.59
1.59 1.73
1.73 1.57
1.57 1.75
1.75 1.54
1.54 1.78
1.78
95
95 1.64
1.64 1.69
1.69 1.62
1.62 1.71
1.71 1.60
1.60 1.73
1.73 1.58
1.58 1.75
1.75 1.56
1.56 1.78
1.78
100
100 1.65
1.65 1.69
1.69 1.63
1.63 1.72
1.72 1.61
1.61 1.74
1.74 1.59
1.59 1.76
1.76 1.57
1.57 1.78
1.78
11-53

Using the Durbin-Watson Statistic

Positive Test is No Test is Negative


Autocorrelation Inconclusive Autocorrelation Inconclusive Autocorrelation

0 dL dU 4-dU 4-dL 4

Fornn==67,
For 67,kk==4: 1.73 4-d
4: ddU1.73 2.27
4-dU2.27
U U
1.47 4-
ddLL1.47 2.53<<2.58
4-ddLL2.53 2.58
HH00isisrejected,
rejected,and
andweweconclude
concludethere
thereisisnegative
negativefirst-order
first-order
autocorrelation.
autocorrelation.
11-54

11-13 Partial F Tests and Variable


Selection Methods
Fullmodel:
Full model:
YY==0 0++1 1XX1 1++2 2XX2 2++3 3XX3 3++4 4XX4 4++
Reducedmodel:
Reduced model:
YY==0 0++1 1XX1 1++2 2XX2 2++

PartialFFtest:
Partial test:
HH0:0:3 3==4 4==00
HH1:1:3 3and
and4not
4
notboth
both00
(SSE  SSE ) / r
PartialFFstatistic:
statistic: R F
Partial F 
(r, (n  (k  1)) MSE
F

whereSSE
where SSERisisthethesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSESSEFisisthe
thesum
sumofofsquared
squared
R F
errorsofofthe
errors thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSESSE/(n-
F/(n-
F F F
(k+1))];rrisisthe
(k+1))]; thenumber
numberofofvariables
variablesdropped
droppedfromfromthe
thefull
fullmodel.
model.
11-55

Variable Selection Methods

••All
All possible
possible regressions
regressions
Run
 Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
of
independentvariables
independent variablesand
andselect
selectbest
bestmodel
model

A p-value of 0.001 indicates


that we should reject the null
hypothesis H0: the slopes for
Lend and Exch. are zero.
11-56

Variable Selection Methods

••Stepwise
Stepwise procedures
procedures
Forward
 Forwardselection
selection
•• Add
Addone
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Backward
 Backwardelimination
elimination
•• Remove
Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Stepwise
 Stepwiseregression
regression
•• Adds
Addsvariables
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
on
thebasis
the basisof
ofthe
theFFstatistic
statistic
11-57

Stepwise Regression

Compute F statistic for each variable not in the model

Is there at least one variable with p-value > Pin? No


Stop
Yes
Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Remove
Is there a variable with p-value > Pout?
variable
No
11-58

Stepwise Regression: Using the


Computer (MINITAB)

MTB>>STEPWISE
MTB STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1’
'M1’ 'LEND'
'LEND' 'PRICE’
'PRICE’ 'EXCHANGE'
'EXCHANGE'

StepwiseRegression
Stepwise Regression

F-to-Enter:
F-to-Enter: 4.00 F-to-Remove:
4.00 F-to-Remove: 4.00
4.00
ResponseisisEXPORTS
Response EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67

Step
Step 11 22
Constant
Constant 0.9348
0.9348 -3.4230
-3.4230
M1
M1 0.520
0.520 0.361
0.361
T-Ratio
T-Ratio 9.89
9.89 9.21
9.21
PRICE
PRICE 0.0370
0.0370
T-Ratio
T-Ratio 9.05
9.05

SS 0.495
0.495 0.331
0.331
R-Sq
R-Sq 60.08
60.08 82.48
82.48
11-59

Using the Computer: MINITAB

MTB>>REGRESS
MTB REGRESS 'EXPORTS’
'EXPORTS’ 44 'M1’
'M1’ 'LEND’
'LEND’ 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
SUBC>vif;
SUBC> vif;
SUBC>dw.
SUBC> dw.
Regression Analysis
Regression Analysis
The regression equation is
The regression equation is
EXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
EXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
Predictor
Predictor Coef
Coef Stdev
Stdev t-ratio
t-ratio pp VIF
VIF
Constant
Constant -4.015
-4.015 2.766
2.766 -1.45
-1.45 0.152
0.152
M1
M1 0.36846
0.36846 0.06385
0.06385 5.77
5.77 0.000
0.000 3.2
3.2
LEND
LEND 0.00470
0.00470 0.04922
0.04922 0.10
0.10 0.924
0.924 5.4
5.4
PRICE 0.036511 0.009326 3.91 0.000 6.3
PRICE 0.036511 0.009326 3.91 0.000 6.3
EXCHANGE 0.268 1.175 0.23 0.820 1.4
EXCHANGE 0.268 1.175 0.23 0.820 1.4

s s==0.3358
0.3358 R-sq==82.5%
R-sq 82.5% R-sq(adj)==81.4%
R-sq(adj) 81.4%

AnalysisofofVariance
Analysis Variance

SOURCE DF SS MS F p
SOURCE DF SS MS F p
Regression 4 32.9463 8.2366 73.06 0.000
Regression 4 32.9463 8.2366 73.06 0.000
Error 62 6.9898 0.1127
Error 62 6.9898 0.1127
Total
Total 6666 39.9361
39.9361
Durbin-Watsonstatistic
Durbin-Watson statistic==2.58
2.58
11-60

Using the Computer: SAS (continued)

Parameter Estimates
Parameter Estimates
Parameter Standard T for H0:
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517
INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517
M1 1 0.368456 0.06384841 5.771 0.0001
M1 1 0.368456 0.06384841 5.771 0.0001
LEND 1 0.004702 0.04922186 0.096 0.9242
LEND 1 0.004702 0.04922186 0.096 0.9242
PRICE 1 0.036511 0.00932601 3.915 0.0002
PRICE 1 0.036511 0.00932601 3.915 0.0002
EXCHANGE 1 0.267896 1.17544016 0.228 0.8205
EXCHANGE 1 0.267896 1.17544016 0.228 0.8205
Variance
Variance
Variable DF Inflation
Variable DF Inflation
INTERCEP
INTERCEP 11 0.00000000
0.00000000
M1
M1 11 3.20719533
3.20719533
LEND
LEND 11 5.35391367
5.35391367
PRICE 1 6.28873181
PRICE 1 6.28873181
EXCHANGE 1 1.38570639
EXCHANGE 1 1.38570639

Durbin-WatsonDD
Durbin-Watson 2.583
2.583
(ForNumber
(For NumberofofObs.)
Obs.) 6767
1stOrder
1st OrderAutocorrelation
Autocorrelation -0.321
-0.321

También podría gustarte