Multiple Regression Analysis: y + X + X + - . - X + U

Multiple Regression Analysis
y = 0 + 1x1 + 2x2 + . . . kxk + u

1. Estimation
Parallels with Simple Regression

0 is still the intercept
1 to k all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption, so now assume that
E(u|x1,x2, ,xk) = 0
Still minimizing the sum of squared
residuals, so have k+1 first order conditions
2
Interpreting Multiple Regression

y 0 1 x1 2 x2 ... k xk , so
y x x ... x ,
1 1
2 2
so holding x2 ,..., xk fixed implies that

y x , that is each has
1 1
a ceteris paribus interpretation

3
Simple vs Multiple Reg Estimate

~ ~
~
Compare the simple regression y 0 1 x1
with the multiple regression y 0 1 x1 2 x2
~
Generally, 1 1 unless :
0 (i.e. no partial effect of x ) OR
2
x1 and x2 are uncorrelated in the sample

4
Goodness-of-Fit
We can think of each observation as being made
up of an explained part, and an unexplained part,
yi y i ui We then define the following :
y y is the total sum of squares (SST)

y y is the explained sum of squares (SSE)
u is the residual sum of squares (SSR)
2
2
i
Then SST SSE SSR
Goodness-of-Fit (continued)
How do we think about how well our
sample regression line fits our sample data?
Can compute the fraction of the total sum
of squares (SST) that is explained by the
model, call this the R-squared of regression
R2 = SSE/SST = 1 SSR/SST
6
Goodness-of-Fit (continued)
We can also think of R 2 as being equal to
the squared correlation coefficient between
the actual yi and the values y i
y y y y
y y y y
2
More about R-squared

R2 can never decrease when another
independent variable is added to a
regression, and usually will increase
Because R2 will usually increase with the
number of independent variables, it is not a
good way to compare models
8
Assumptions for Unbiasedness

Population model is linear in parameters: y = 0 +
1x1 + 2x2 ++ kxk + u
We can use a random sample of size n, {(xi1, xi2,, xik,
yi): i=1, 2, , n}, from the population model, so that
the sample model is yi = 0 + 1xi1 + 2xi2 ++ kxik +
ui
E(u|x1, x2, xk) = 0, implying that all of the
explanatory variables are exogenous
None of the xs is constant, and there are no exact
linear relationships among them
Too Many or Too Few Variables

What happens if we include variables in
our specification that dont belong?
There is no effect on our parameter
estimate, and OLS remains unbiased
What if we exclude a variable from our
specification that does belong?
OLS will usually be biased
10
Omitted Variable Bias

Suppose the true model is given as
y 0 1 x1 2 x2 u , but we
~ ~
~
estimate y x u , then
~
1
x
x
i1
i1
x1 yi
x1
1 1
11
Omitted Variable Bias (cont)

Recall the true model, so that
yi 0 1 xi1 2 xi 2 ui , so the
numerator becomes
x x
x x
i1
i1
0
2
1 xi1 2 xi 2 ui
2 xi1 x1 xi 2 xi1 x1 ui
12

~
1 2
x x x x
x x x
i1
i1
i2
2
i1
i1
x1 ui
x1
since E(ui ) 0, taking expectations we have
~
E 1 1 2
x x x
x x
i1
i1
i2
2
13

Consider the regression of x2 on x1
~ ~
~
~
x2 0 1 x1 then 1
~
~
so E 1 1 2 1
x x x
x x
i1
i1
i2
2
14
Summary of Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0
2 > 0
Positive bias
Negative bias
2 < 0
Negative bias
Positive bias
15
Omitted Variable Bias Summary

Two cases where bias is equal to zero
2 = 0, that is x2 doesnt really belong in model

x1 and x2 are uncorrelated in the sample
If correlation between x2 , x1 and x2 , y is

the same direction, bias will be positive
If correlation between x2 , x1 and x2 , y is
the opposite direction, bias will be negative
16
The More General Case

Technically, can only sign the bias for the
more general case if all of the included xs
are uncorrelated
Typically, then, we work through the bias
assuming the xs are uncorrelated, as a
useful guide even if this assumption is not
strictly true
17
Variance of the OLS Estimators

Now we know that the sampling distribution of
our estimate is centered around the true
parameter
Want to think about how spread out this
distribution is
Much easier to think about this variance under
an additional assumption, so
Assume Var(u|x1, x2,, xk) = 2
(Homoskedasticity)
18
Variance of OLS (cont)

Let x stand for (x1, x2,xk)
Assuming that Var(u|x) = 2 also implies
that Var(y| x) = 2
The 4 assumptions for unbiasedness, plus
this homoskedasticity assumption are
known as the Gauss-Markov assumptions
19
Variance of OLS (cont)

Given the Gauss - Markov Assumptions
Var j
, where
2
SST j 1 R j
2
SST j xij x j and R is the R

2
2
j
from regressing x j on all other x' s

20
Components of OLS Variances

The error variance: a larger 2 implies a
larger variance for the OLS estimators
The total sample variation: a larger SST j
implies a smaller variance for the estimators
Linear relationships among the
independent variables: a larger Rj2 implies a
larger variance for the estimators
21
Misspecified Models
Consider again the misspecified model
~ ~
~
~
y 0 1 x1 , so that Var 1
SST1
~
Thus, Var Var unless x and
x2 are uncorrelated, then they' re the same

22
Misspecified Models (cont)

While the variance of the estimator is
smaller for the misspecified model, unless
2 = 0 the misspecified model is biased
As the sample size grows, the variance of
each estimator shrinks to zero, making the
variance difference less important
23
Estimating the Error Variance

We dont know what the error variance, 2,
is, because we dont observe the errors, ui
What we observe are the residuals, i
We can use the residuals to form an
estimate of the error variance
24
Error Variance Estimate (cont)
n k 1 SSR df
thus, se SST 1 R
2
2
i
j
2 12
j
df = n (k + 1), or df = n k 1
df (i.e. degrees of freedom) is the (number
of observations) (number of estimated
parameters)
25
The Gauss-Markov Theorem

Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is BLUE
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS
26

Multiple Regression Analysis: y + X + X + - . - X + U

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Multiple Regression Analysis: y + X + X + - . - X + U

Cargado por

Copyright:

Formatos disponibles

Multiple Regression Analysis

y = 0 + 1x1 + 2x2 + . . . kxk + u

Parallels with Simple Regression

Interpreting Multiple Regression

so holding x2 ,..., xk fixed implies that

a ceteris paribus interpretation

Simple vs Multiple Reg Estimate

x1 and x2 are uncorrelated in the sample

y y is the total sum of squares (SST)

Then SST SSE SSR

More about R-squared

Assumptions for Unbiasedness

Too Many or Too Few Variables

Omitted Variable Bias

Omitted Variable Bias (cont)

Omitted Variable Bias (cont)

since E(ui ) 0, taking expectations we have

Omitted Variable Bias (cont)

Summary of Direction of Bias

Omitted Variable Bias Summary

2 = 0, that is x2 doesnt really belong in model

If correlation between x2 , x1 and x2 , y is

The More General Case

Variance of the OLS Estimators

Variance of OLS (cont)

Variance of OLS (cont)

SST j xij x j and R is the R

from regressing x j on all other x' s

Components of OLS Variances

x2 are uncorrelated, then they' re the same

Misspecified Models (cont)

Estimating the Error Variance

Error Variance Estimate (cont)

The Gauss-Markov Theorem

También podría gustarte