Documentos de Académico
Documentos de Profesional
Documentos de Cultura
COMPLETE
BUSINESS
STATISTICS
by
AMIR D. ACZEL
&
JAYAVEL SOUNDERPANDIAN
6th edition.
11-2
Chapter 11
Multiple Regression
11-3
y Lines y Planes
B
B
A
Slope: 11 C
A
x1
Intercept: 00
x2
x
Any two points (A and B), or Any three points (A, B, and C), or an
an intercept and slope (0 and intercept and coefficients of x1 and x2
1), define a line on a two- (0 , 1, and 2), define a plane in a
dimensional surface. three-dimensional surface.
11-8
x1
y b0 b1x
X x2 y b0 b1 x1 b2 x 2
InInaasimple
simpleregression
regressionmodel,
model,
model
model InInaamultiple
multipleregression
regressionmodel,
model,
model
model
theleast-squares
the least-squaresestimators
estimators theleast-squares
least-squaresestimators
estimators
the
minimizethe
minimize thesum
sumofofsquared
squared minimizethethesum
sumofofsquared
squared
minimize
errorsfrom
errors fromthe
theestimated
estimated errorsfrom
fromthe
theestimated
estimated
errors
regressionline.
regression line. regressionplane.
plane.
regression
11-10
Least-Squares Estimation:
The 2-Variable Normal Equations
Minimizing the sum of squared errors with respect to the
estimated coefficients b0, b1, and b2 yields the following
normal equations which can be solved for b0, b1, and b2.
y nb b x b x
0 1 1 2 2
x y b x b x b x x
2
1 0 1 1 1 2 1 2
x y b x b x x b x
2
2 0 2 1 1 2 2 2
11-12
Example 11-1
72 12
2 Normal
72 12 55 60
60 144
144 25
25 864
864 360
360
76
76 11
11 88 88
88 121
121 64
64 836
836 608
608
78
78 15
15 66 90
90 225
225 36
36 1170
1170 468
468 743==10b
743 10b+123b
0+123b+65b 1+65b2
70 10 0 1 2
70 10 55 50
50 100
100 25
25 700
700 350
350 9382==123b
123b+1615b
68 11 9382 0+1615b+869b1+869b2
68 11 33 33
33 121
121 99 748
748 204
204 0 1 2
80
80 16
16 99 144
144 256
256 81
81 1280
1280 720
720 5040==65b
5040 65b+869b
0+869b+509b1+509b2
82
82 14
14 12
12 168
168 196
196 144
144 1148
1148 984
984 0 1 2
65
65 88 44 32
32 64
64 16
16 520
520 260
260
62 88 33 24 64 99 496 186
62
90
90 18
18 10
10
24
180
180
64
324
324 100
100
496
1620
1620
186
900
900
bb00==47.164942
47.164942
---
---
743
---
---
123
---
---
65
---
---
869
----
----
1615
---
---
509
----
----
9382
----
----
5040
bb11==1.5990404
1.5990404
743 123 65 869 1615 509 9382 5040
bb22==1.1487479
1.1487479
Estimatedregression
Estimated regressionequation:
equation:
47164942
YY 47164942
.. 15990404
15990404
.. XX11 11487479
11487479
.. XX22
11-13
y
Y Y: Error Deviation
Total deviation: Y Y
Y Y : Regression Deviation
y
x1
x2
TotalDeviation
Total Deviation==Regression
RegressionDeviation
Deviation++Error
ErrorDeviation
Deviation
SST
SST == SSR
SSR ++ SSE
SSE
11-15
Sourceofof Sum
Source Sumof
of Degreesofof
Degrees
Variation Squares
Variation Squares Freedom Mean
Freedom MeanSquare
Square FFRatio
Ratio
Regression SSR
Regression SSR kk SSR
MSR F MSR
k MSE
Error
Error SSE
SSE nn--(k+1)
(k+1) SSE
MSE
( n ( k 1))
Total
Total SST
SST n-1
n-1 SST
MST
( n 1)
11-16
2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
2 SSR SSE
R = =1-
SST SST
11-18
SST
SSR SSE
2 SSR SSE
R = = 1-
SST SST
Example11-1:
Example 11-1: ss==1.911
1.911 R-sq==96.1%
R-sq 96.1% R-sq(adj)==95.0%
R-sq(adj) 95.0%
11-19
Regression SSR
Regression SSR (k)
(k) MSR
SSR F
MSR MSE
k
Error
Error SSE
SSE (n-(k+1))
(n-(k+1)) SSE
=(n-k-1)
=(n-k-1) MSE
( n ( k 1))
Total
Total SST
SST (n-1)
(n-1) SST
MST
( n 1)
SSE
SSR SSE 2
2 R ( n ( k 1))
R = = 1- F 2 (n - (k + 1)) MSE
SST SST 2 R = 1- =
(1 R ) (k ) SST MST
(n - 1)
11-20
Hypothesistests
Hypothesis testsabout
aboutindividual
individualregression
regressionslope
slope
parameters:
parameters:
(1)
(1) HH00::11==00
HH11::110
0
(2)
(2) HH00::22==00
HH11::220
0
...
..
.
(k)
(k) HH00::kk==00
H : 0 b
b 00
Test H 1: k0for test i: t
statistic
Test statistic
1 k for test i: t
( n ( k 1 )
i
i
ss((bb))
( n ( k 1 )
i
i
11-21
It appears that the residuals are randomly distributed with no pattern and
with equal variance as M1 increases
11-25
It appears that the residuals are increasing as the Price increases. The
variance of the residuals is not constant.
11-26
. . *
.
.. .. Regression
Regression line
. .. ..
when all data are
line with
. .. . included
.. . outlier
.. .. .. .
. . .. .
No relationship in
this cluster
* Outlier
x x
Outliers
Outliers InfluentialObservations
Influential Observations
11-28
. . . . x x x
x
x x
.. .. .. . x x x x
x
. .. . x
x
x x x
x
11-29
UnusualObservations
Unusual Observations
Obs.
Obs. M1M1 EXPORTS
EXPORTS Fit
Fit Stdev.Fit
Stdev.Fit Residual
Residual St.Resid
St.Resid
11 5.10
5.10 2.6000
2.6000 2.6420
2.6420 0.1288
0.1288 -0.0420
-0.0420 -0.14XX
-0.14
22 4.90
4.90 2.6000
2.6000 2.6438
2.6438 0.1234
0.1234 -0.0438
-0.0438 -0.14XX
-0.14
25
25 6.20
6.20 5.5000
5.5000 4.5949
4.5949 0.0676
0.0676 0.9051
0.9051 2.80R
2.80R
26
26 6.30
6.30 3.7000
3.7000 4.6311
4.6311 0.0651
0.0651 -0.9311
-0.9311 -2.87R
-2.87R
50
50 8.30
8.30 4.3000
4.3000 5.1317
5.1317 0.0648
0.0648 -0.8317
-0.8317 -2.57R
-2.57R
67
67 8.20
8.20 5.6000
5.6000 4.9474
4.9474 0.0668
0.0668 0.6526
0.6526 2.02R
2.02R
RRdenotes
denotesan
anobs.
obs.with
withaalarge
largest.
st.resid.
resid.
XXdenotes
denotesan
anobs.
obs.whose
whoseXXvalue
valuegives
givesititlarge
largeinfluence.
influence.
11-30
Sales EstimatedRegression
Estimated RegressionPlane
Planefor
forExample
Example11-1
11-1
89.76
Advertising
18.00
63.42
8.00
Promotions 12 3
11-31
AA(1 -))100%
(1- 100%prediction
predictioninterval intervalfor
foraavalue
valueof
ofYYgiven
givenvalues
valuesof
of XX::
ii
yˆyˆtt( ,(n(k 1))) ss22((yˆyˆ))MSE MSE
( 2,(n(k 1)))
2
AA(1
(1--αα))100%
100%prediction
predictioninterval intervalfor
forthe
theconditiona
conditionallmean
meanof
ofYYgiven
given
valuesof
values of XX::
ii
yˆyˆtt ss[[EˆEˆ(Y(Y)])]
(( 2,(,(nn((kk11)))
)))
2
11-32
b0+b2
Line for X2=0
b0
x1
X1 x2
AAregression
regressionwith
withone
one AAmultiple
multipleregression
regressionwith
withtwo
two
quantitativevariable
quantitative variable(X
(X)1)and
and quantitativevariables
quantitative variables(X
(X1and
andXX)2)
1 1 2
onequalitative
one qualitativevariable
variable(X(X):2): andone
and onequalitative
qualitativevariable
variable(X
(X):3):
2 3
y b b x b x
0 1 1 2 2
y b b x b x b x
0 1 1 2 2 3 3
11-34
Y
Line for X = 0 and X3 = 1 AAqualitative
qualitative
variablewith
variable withrr
levelsor
levels orcategories
categories
Line for X2 = 1 and X3 = 0 isisrepresented
representedwith
with
(r-1)0/1
(r-1) 0/1(dummy)
(dummy)
b0+b3 variables.
variables.
Line for X2 = 0 and X3 = 0
b0+b2
b0
X1
Category XX2
Category XX33
2
AAregression
regressionwith
withone
onequantitative
quantitativevariable
variable(X
(X)1)and
1
andtwo
two Adventure 00
Adventure 00
qualitativevariables
qualitative variables(X
(X2and
2
andXX):
2
2): Drama
Drama 00 11
Romance 11 00
y b b x b x b x
0 1 1 2 2 3 3
Romance
11-35
Salary==8547
Salary 8547 ++ 949
949Education
Education ++ 1258
1258Experience
Experience -- 3256
3256Gender
Gender
(SE) (32.6)
(SE) (32.6) (45.1)
(45.1) (78.5)
(78.5) (212.4)
(212.4)
(t) (262.2)
(t) (262.2) (21.0)
(21.0) (16.0)
(16.0) (-15.3)
(-15.3)
1 if Female Onaverage,
On average,female
femalesalaries
salariesare
are
Gender
0 if Male $3256below
$3256 belowmale
malesalaries
salaries
11-36
b0
Slope = b1+b3
b0+b2
X1
AAregression
regressionwith
withinteraction
interactionbetween
betweenaaquantitative
quantitative
variable(X
variable (X)1)and
andaaqualitative
qualitativevariable
variable(X
(X2):):
1 2
y b b x b x b x x
0 1 1 2 2 3 1 2
11-37
y b b X b X
0 1 2
2
(b 0) y b b X b X b X
0 1 2
2
3
3
X1 X1
11-38
Variable Estimate
Variable Estimate Standard
StandardError
Error T-statistic
T-statistic
XX1 1 2.34
2.34 0.92
0.92 2.54
2.54
XX2 2 3.11
3.11 1.05
1.05 2.96
2.96
XX121 4.22 1.00 4.22
2
4.22 1.00 4.22
XX222 3.57 2.12 1.68
2
3.57 2.12 1.68
XX1X1X 2.77 2.30 1.20
2
2
2.77 2.30 1.20
11-40
The logarithmic
The logarithmic transformation
transformation::
log log log
logYY log X log
logX 0
X log
logX logX 1
log
X log 1 2 2 3 3
0 1 1 2 2 3 3
11-41
The logarithmic
The logarithmic transformation
transformation::
logY
log Y log
log XX
0
0
1
log
log
1
1
1
11-42
30 25
20
SALES
SALES
15
0 5 10 15 0 1 2 3
ADVERT LOGADV
RESIDS 0.5
LOGSALE
2.5
Y = 1.70 0 8 2 + 0 .5 53 13 6 X -0.5
R- Sq uared = 0 .9 47
-1.5
1.5
2 12 22
0 1 2 3
LOGADV Y-HAT
11-43
••Square
Square root
root transformation:
transformation: Y Y
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
theconditional
conditionalmean
meanof
ofYY
••Logarithmic
Logarithmic transformation:
transformation: Y log(Y )
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofregression
regressionerrors
errorsisisapproximately
approximately
proportionalto
proportional tothe
thesquare
squareof
ofthe
theconditional
conditionalmean
meanof
ofYY
••Reciprocal
Reciprocal transformation:
transformation: Y 1
Y
Useful
Usefulwhen
whenthe
thevariance
varianceof
ofthe
theregression
regressionerrors
errorsisis
approximatelyproportional
approximately proportionalto
tothe
thefourth
fourthpower
powerof
ofthe
the
conditionalmean
conditional meanofofYY
11-44
E (Y X )
1 e ( X )
0 1
y Logistic Function
1
0
x
11-45
11-11: Multicollinearity
x2
x1 x2 x1
Orthogonal X variables provide Perfectly collinear X variables
information from independent provide identical information
sources. No multicollinearity. content. No regression.
x2
x2
x1 x1
Some degree of collinearity.
A high degree of negative
Problems with regression depend
collinearity also causes problems
on the degree of collinearity.
with regression.
11-46
Effects of Multicollinearity
•• Variancesof
Variances ofregression
regressioncoefficients
coefficientsare
areinflated.
inflated.
•• Magnitudesof
Magnitudes ofregression
regressioncoefficients
coefficientsmay
maybebedifferent
different
fromwhat
from whatare
areexpected.
expected.
•• Signsof
Signs ofregression
regressioncoefficients
coefficientsmay
maynotnotbe
beas
asexpected.
expected.
•• Addingor
Adding orremoving
removingvariables
variablesproduces
produceslarge
largechanges
changesin in
coefficients.
coefficients.
•• Removingaadata
Removing datapoint
pointmay
maycause
causelarge
largechanges
changesin in
coefficientestimates
coefficient estimatesor
orsigns.
signs.
•• Insome
In somecases,
cases,the
theFFratio
ratiomay
maybebesignificant
significantwhile
whilethe
thett
ratiosare
ratios arenot.
not.
11-47
Detecting the Existence of Multicollinearity:
Correlation Matrix of Independent Variables and
Variance Inflation Factors
11-48
Thevariance
The varianceinflation
inflationfactor
factorassociated
associatedwithwithXXhh::
11
VIF((XX h))
VIF h 1 R
1 Rhh
2
2
2 2
whereRR2hh isisthe
where theRR2 value
valueobtained
obtainedfor forthe
theregression
regressionof
ofXXon
on
theother
the otherindependent
independentvariables.
variables.
50
0
0.0 0.5 1.0 Rh2
11-49
•• Drop
Drop aa collinear
collinear variable
variable from
from the
the
regression
regression
•• Change
Change inin sampling
sampling plan
plan to
to include
include
elements outside
elements outside the
the multicollinearity
multicollinearity range
range
•• Transformations
Transformations of of variables
variables
•• Ridge
Ridge regression
regression
11-51
0 dL dU 4-dU 4-dL 4
Fornn==67,
For 67,kk==4: 1.73 4-d
4: ddU1.73 2.27
4-dU2.27
U U
1.47 4-
ddLL1.47 2.53<<2.58
4-ddLL2.53 2.58
HH00isisrejected,
rejected,and
andweweconclude
concludethere
thereisisnegative
negativefirst-order
first-order
autocorrelation.
autocorrelation.
11-54
PartialFFtest:
Partial test:
HH0:0:3 3==4 4==00
HH1:1:3 3and
and4not
4
notboth
both00
(SSE SSE ) / r
PartialFFstatistic:
statistic: R F
Partial F
(r, (n (k 1)) MSE
F
whereSSE
where SSERisisthethesum
sumofofsquared
squarederrors
errorsofofthe
thereduced
reducedmodel,
model,SSESSEFisisthe
thesum
sumofofsquared
squared
R F
errorsofofthe
errors thefull
fullmodel;
model;MSE
MSEFisisthe
themean
meansquare
squareerror
errorofofthe
thefull
fullmodel
model[MSE
[MSEF==SSESSE/(n-
F/(n-
F F F
(k+1))];rrisisthe
(k+1))]; thenumber
numberofofvariables
variablesdropped
droppedfromfromthe
thefull
fullmodel.
model.
11-55
••All
All possible
possible regressions
regressions
Run
Runregressions
regressionswith
withall
allpossible
possiblecombinations
combinationsof
of
independentvariables
independent variablesand
andselect
selectbest
bestmodel
model
••Stepwise
Stepwise procedures
procedures
Forward
Forwardselection
selection
•• Add
Addone
onevariable
variableatataatime
timetotothe
themodel,
model,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Backward
Backwardelimination
elimination
•• Remove
Removeone
onevariable
variableatataatime,
time,on
onthe
thebasis
basisofofits
itsFFstatistic
statistic
Stepwise
Stepwiseregression
regression
•• Adds
Addsvariables
variablestotothe
themodel
modeland
andsubtracts
subtractsvariables
variablesfrom
fromthe
themodel,
model,on
on
thebasis
the basisof
ofthe
theFFstatistic
statistic
11-57
Stepwise Regression
Remove
Is there a variable with p-value > Pout?
variable
No
11-58
MTB>>STEPWISE
MTB STEPWISE'EXPORTS'
'EXPORTS'PREDICTORS
PREDICTORS 'M1’
'M1’ 'LEND'
'LEND' 'PRICE’
'PRICE’ 'EXCHANGE'
'EXCHANGE'
StepwiseRegression
Stepwise Regression
F-to-Enter:
F-to-Enter: 4.00 F-to-Remove:
4.00 F-to-Remove: 4.00
4.00
ResponseisisEXPORTS
Response EXPORTS on
on 44predictors,
predictors,with
withNN== 67
67
Step
Step 11 22
Constant
Constant 0.9348
0.9348 -3.4230
-3.4230
M1
M1 0.520
0.520 0.361
0.361
T-Ratio
T-Ratio 9.89
9.89 9.21
9.21
PRICE
PRICE 0.0370
0.0370
T-Ratio
T-Ratio 9.05
9.05
SS 0.495
0.495 0.331
0.331
R-Sq
R-Sq 60.08
60.08 82.48
82.48
11-59
MTB>>REGRESS
MTB REGRESS 'EXPORTS’
'EXPORTS’ 44 'M1’
'M1’ 'LEND’
'LEND’ 'PRICE'
'PRICE' 'EXCHANGE';
'EXCHANGE';
SUBC>vif;
SUBC> vif;
SUBC>dw.
SUBC> dw.
Regression Analysis
Regression Analysis
The regression equation is
The regression equation is
EXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
EXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
Predictor
Predictor Coef
Coef Stdev
Stdev t-ratio
t-ratio pp VIF
VIF
Constant
Constant -4.015
-4.015 2.766
2.766 -1.45
-1.45 0.152
0.152
M1
M1 0.36846
0.36846 0.06385
0.06385 5.77
5.77 0.000
0.000 3.2
3.2
LEND
LEND 0.00470
0.00470 0.04922
0.04922 0.10
0.10 0.924
0.924 5.4
5.4
PRICE 0.036511 0.009326 3.91 0.000 6.3
PRICE 0.036511 0.009326 3.91 0.000 6.3
EXCHANGE 0.268 1.175 0.23 0.820 1.4
EXCHANGE 0.268 1.175 0.23 0.820 1.4
s s==0.3358
0.3358 R-sq==82.5%
R-sq 82.5% R-sq(adj)==81.4%
R-sq(adj) 81.4%
AnalysisofofVariance
Analysis Variance
SOURCE DF SS MS F p
SOURCE DF SS MS F p
Regression 4 32.9463 8.2366 73.06 0.000
Regression 4 32.9463 8.2366 73.06 0.000
Error 62 6.9898 0.1127
Error 62 6.9898 0.1127
Total
Total 6666 39.9361
39.9361
Durbin-Watsonstatistic
Durbin-Watson statistic==2.58
2.58
11-60
Parameter Estimates
Parameter Estimates
Parameter Standard T for H0:
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517
INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517
M1 1 0.368456 0.06384841 5.771 0.0001
M1 1 0.368456 0.06384841 5.771 0.0001
LEND 1 0.004702 0.04922186 0.096 0.9242
LEND 1 0.004702 0.04922186 0.096 0.9242
PRICE 1 0.036511 0.00932601 3.915 0.0002
PRICE 1 0.036511 0.00932601 3.915 0.0002
EXCHANGE 1 0.267896 1.17544016 0.228 0.8205
EXCHANGE 1 0.267896 1.17544016 0.228 0.8205
Variance
Variance
Variable DF Inflation
Variable DF Inflation
INTERCEP
INTERCEP 11 0.00000000
0.00000000
M1
M1 11 3.20719533
3.20719533
LEND
LEND 11 5.35391367
5.35391367
PRICE 1 6.28873181
PRICE 1 6.28873181
EXCHANGE 1 1.38570639
EXCHANGE 1 1.38570639
Durbin-WatsonDD
Durbin-Watson 2.583
2.583
(ForNumber
(For NumberofofObs.)
Obs.) 6767
1stOrder
1st OrderAutocorrelation
Autocorrelation -0.321
-0.321