Está en la página 1de 17
MULTIPLE REGRESSION
MULTIPLE REGRESSION
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU
The Multiple Regression Model
The Multiple Regression
Model

Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i )

Multiple Regression Model with k Independent Variables:

Y i
Y
i

Y-intercept

Population slopes

The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or

β

0

β X

1

1i

β X

2

The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or

2i

β

Random Error

k

X

ki

ε

i

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Assumptions
Assumptions

The error term is normally distributed.

For each

fixed value of X, the distribution of Y is normal.

The mean of the error term is 0 and SD should be one .

The variance of the error term is constant. This variance does not depend on the values assumed by X.

The error terms are uncorrelated. In other words, the observations have been drawn independently.

The

regressors

are

independent

amongst

themselves.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU
Assumptions
Assumptions

Independent

variables

should

be

uncorrelated with residual.. Model should be properly specified.

No.

of observation should be more than

no. of parameters Model is linear in parameters

Independent

variables

are

repeated samples.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

fixed

in

Statistics Associated with Multiple Regression
Statistics Associated with Multiple
Regression

Coefficient of multiple determination. The strength of association in multiple regression is

measured by the square of the multiple correlation

coefficient, R 2 ,

which is

also

called

the coefficient

of

multiple determination.

 

Adjusted R 2

R 2 , coefficient of multiple determination, is adjusted for the number of independent variables and the sample size to account for the diminishing returns.

After the first few variables, the additional independent variables do not make much contribution.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Statistics Associated with

Multiple Regression
Multiple Regression

F test

Used to test the null hypothesis that the

coefficient of multiple determination in the population, R 2 pop , is zero.

The test statistic has an F distribution with k and (n - k - 1) degrees of freedom.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU
Statistics Associated with Multiple Regression
Statistics Associated with
Multiple Regression

Partial regression coefficient.

The partial regression coefficient, b , denotes the change in the predicted value, , per unit change

1

Statistics Associated with Multiple Regression • Partial regression coefficient . The partial regression coefficient, b ,

Y

in

X 1
X
1

when the other independent variables, X 2 to X k ,

i

are held constant.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Multiple Regression Output

Regression Statistics

Multiple R R Square Adjusted R Square Standard Error

0.72213

0.52148

0.44172

47.46341

Multiple Regression Output Regression Statistics Multiple R R Square Adjusted R Square Standard Error 0.72213 0.52148

Observations

Sales 306.526 - 24.975(Price) 74.131(Advertising)

15

Significance ANOVA df SS MS F F 14730.01 Regression 2 29460.027 3 6.53861 0.01201 Residual 12
Significance
ANOVA
df
SS
MS
F
F
14730.01
Regression
2
29460.027
3
6.53861
0.01201
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficien
Standard
Upper
ts
Error
t Stat
P-value
Lower 95%
95%
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Mr. Pranav Ranjan & Ms. Razia Sehdev
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
ICTC, LPU
25.96732
2.85478
0.01449
17.55303
130.70888

The Multiple Regression Equation

The Multiple Regression Equation Sales  306.526 - 24.975(Price)  74.131(Advertising) where Sales is in number
Sales  306.526 - 24.975(Price)  74.131(Advertising)

Sales 306.526 - 24.975(Price) 74.131(Advertising)

where Sales is in number of pies per week Price is in $

Advertising is in $100’s.

The Multiple Regression Equation Sales  306.526 - 24.975(Price)  74.131(Advertising) where Sales is in number
b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each
b 1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

b 2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100

increase in

advertising, net of the effects of changes due to price

Using The Equation to Make

Predictions
Predictions

Predict sales for a week in which the selling

price is $5.50 and advertising is $350:

Sales  306.526 - 24.975(Price)  306.526 - 24.975 (5.50)  428.62   74.131(Advertising) 74.131

Sales

306.526 - 24.975(Price)

306.526 - 24.975 (5.50) 428.62

74.131(Advertising)

74.131 (3.5)

Predicted sales is 428.62 pies

Note that Advertising is in $100’s, so $350 means that X 2 = 3.5

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Multiple Coefficient of

Determination
Determination

Regression Statistics

(continued)

Multiple R 0.72213 SSR 29460.0 R Square 0.52148 r 2    .52148 SST 56493.3
Multiple R
0.72213
SSR
29460.0
R Square
0.52148
r 2 
 .52148
SST
56493.3
Adjusted R
Square
0.44172
Standard Error
Observations
47.46341
15
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
Significance
ANOVA
df
SS
MS
F
F
14730.01
Regression
2
29460.027
3
6.53861
0.01201
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficien
Standard
Upper
ts
Error
t Stat
P-value
Lower 95%
95%
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
Mr. Pranav Ranjan & Ms. Razia Sehdev
10.83213
-2.30565
0.03979
-48.57626
-1.37392
ICTC, LPU
Advertising
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888

Regression Statistics

Adjusted r 2
Adjusted r 2

(continued)

Multiple R

R Square

0.72213

0.52148

2

r

adj

.44172

 

Adjusted R

0.44172

Adjusted R 0.44172
 

Square

Standard Error

47.46341

Observations

15

 
 

Significance

ANOVA

df

SS

MS

F

F

 

14730.01

Regression

2

29460.027

3

6.53861

0.01201

Residual

12

27033.306

2252.776

Total

14

56493.333

 

Coefficien

Standard

ts

Error

t Stat

P-value

Lower 95%

Intercept

306.52619

114.25389

2.68285

0.01993

Mr. Pranav Ranjan & Ms. Razia Sehdev

57.58835

Price

-24.97509

10.83213

ICTC, LPU

-2.30565

0.03979

-48.57626

Regression Statistics Adjusted r 2 (continued) Multiple R R Square 0.72213 0.52148 2 r adj .44172
Regression Statistics Adjusted r 2 (continued) Multiple R R Square 0.72213 0.52148 2 r adj .44172

44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample

size and number of independent variables

Upper

95%

555.46404

-1.37392

F Test for Overall Significance (continued) Regression Statistics Multiple R R Square Adjusted R Square Standard
F Test for Overall Significance
(continued)
Regression Statistics
Multiple R
R Square
Adjusted R
Square
Standard Error
0.72213
0.52148
0.44172
MSR
14730.0
47.46341
F 
 6.5386
Observations
15
MSE
2252.8
With 2 and 12 degrees
of freedom
Significance
P-value for
the F Test
ANOVA
df
SS
MS
F
F
14730.01
Regression
2
29460.027
3
6.53861
0.01201
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficien
Standard
Upper
ts
Error
t Stat
P-value
Lower 95%
95%
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
Advertising
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Mr. Pranav Ranjan & Ms. Razia Sehdev
Are Individual Variables Significant? Regression Statistics (continued) Multiple R R Square Adjusted R Square Standard Error
Are Individual Variables Significant?
Regression Statistics
(continued)
Multiple R
R Square
Adjusted R
Square
Standard Error
Observations
0.72213
0.52148
t-value for Price is t = -2.306, with
p-value .0398
0.44172
47.46341
15
t-value for Advertising is t = 2.855,
with p-value .0145
Significance
ANOVA
df
SS
MS
F
F
14730.01
Regression
2
29460.027
3
6.53861
0.01201
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficien
Standard
Upper
ts
Error
t Stat
P-value
Lower 95%
95%
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
Mr. Pranav Ranjan & Ms. Razia Sehdev
Advertising
74.13096
25.96732
2.85478 0.01449
17.55303
130.70888
ICTC, LPU
Multicollinearity
Multicollinearity

Multicollinearity arises when intercorrelations among the predictors are very high. Result in several problems, including:

The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high.

The magnitudes as well as the signs of the partial

regression coefficients may change from sample to sample. It becomes difficult to assess the relative importance of the independent variables in explaining the

variation in the dependent variable. Predictor variables may be incorrectly included or removed in stepwise regression.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU
Multicollinearity
Multicollinearity

A simple procedure for adjusting for multicollinearity consists of using only one of the variables in a highly correlated set of variables.

Alternatively, the set of independent variables can be transformed into a new set of predictors that are

mutually independent by using techniques such as principal components analysis.

More

specialized

techniques,

such

as

ridge

regression and latent root regression, can also be used.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Multicollinearity Diagnostics:

Multicollinearity Diagnostics: • Variance Inflation Factor (VIF) – measures how much the variance of the regression
• Variance Inflation Factor (VIF) – measures how much the variance of the regression coefficients is
• Variance Inflation Factor (VIF) – measures how much the variance
of the regression coefficients is inflated by multicollinearity
problems. If VIF equals 0, there is no correlation between the
independent measures. A VIF measure of 1 is an indication of some
association between predictor variables, but generally not enough
to cause problems. A maximum acceptable VIF value would be 10;
anything higher would indicate a problem with multicollinearity.
• Tolerance – the amount of variance in an independent variable that
is not explained by the other independent variables. If the other
variables explain a lot of the variance of a particular independent
variable we have a problem with multicollinearity. Thus, small
values for tolerance indicate problems of multicollinearity. The
minimum cutoff value for tolerance is typically .10. That is, the
tolerance value must be smaller than .10 to indicate a problem of
multicollinearity.
Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU