Está en la página 1de 36

Endogeneities

IV under Exact Identication


IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Instrumental Variables
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
March 8, 2009
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Motivation
Consistency of OLS crucially depends on E(u
i
x
i
) = 0.
In general, when this assumption does not hold x
i
will be
endogenous
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Endogeneities: Examples
Simultaneous Equations
The simplest supply and demand system:
_
_
_
q
s
i
= x
s
i

s
1
+
s
2
p
i
+
s
i
q
d
i
= x
d
i

d
1
+
d
2
p
i
+
d
i
q
s
i
= q
d
i
In equilibrium:
p
i
= (
s
2

d
2
)
1
(x
d
i

d
1
x
s
i

s
1
+
d
i

s
i
)
In both, supply and demand, p
i
as an explanatory variable depends
on the error term. Simple OLS is not consistent
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Omitted Variables
Y =
1
X
1
+
2
X
2
+u
if we omit X
2
Y =
1
X
1
+

1
X
2
. Then the predeterminedness assumption will be
violated unless X
1
and X
2
are orthogonal or
2
= 0.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Explanatory variable measured with error
C
i
=
1
+
2
X

i
+u
i
and assume all assumptions for consistency hold. Now suppose we
observe a noisy version of X
i
:
X
i
= X

i
+
i

i
is a measurement error. We will assume
i
is iid, with
E(
i
) = 0, V (
i
) =
2

and uncorrelated with X

i
and u
i
.
Replacing X

i
= X
i

i
, the regression model can be written as
C
i
=
1
+
2
X
i
+
i
with =
2

i
+u
i
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Now
Cov(X
i
,
i
) = E(X
i
,
i
) = E [(X

i
+
i
) (
2

i
+u
i
)]
=
2

= 0
Then the OLS estimator that regresses Y
i
on X
i
is inconsistent.
We will derive more details in the next homework.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
IV under Exact Identication
Model as before. z
i
is a vector of K instrumental variables.
1
Linearity: y
i
= x

0
+u
i
i = 1, . . . , n.
2
Random sample: {x
i
, z
i
, u
i
} is a jointly i.i.d. process.
3
IV validity 1): rank. E(z
i
x

i
) =
zx
is a nite, invertible
matrix
4
IV validity 2): orthogonality. E(z
ik
u
i
) = 0 for all i and
k = 1, . . . , K.
5
V (z
i
u
i
) = S nite positive denite.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
The Method-of-Moments and the IV estimator.
The orthogonality condition
E(z
i
u
i
) = E
_
z
i
(y
i
x

0
)

= 0
is a set of K moment conditions for the K unknown parameters.
Method-of-moments: choose as estimator the values of the
parameters that force the sample moments to hold.

IV
satises:
1
n
n

i=1
z
i
(y
i
x

IV
) = 0
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
This is a system of K linear equations with K unknowns with
solution

IV
=
_
n

i=1
z
i
x

i
_
1
_
n

i=1
z
i
y
i
_
= (Z

X)
1
Z

Y
Existence: asympotically guaranteed by the rank condition.
Exact identication: same number of variables and
instruments (K).
If x
i
is exogenous, it is a valid vector of instruments for itself.
Then

IV
=

OLS
.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Large sample properties
Consistency:

IV
p

0
Proof (sketch):

IV
= (Z

X)
1
Z

Y . Replacing:

IV
=
0
+ (Z

X)
1
Z

u
=
0
+
_
Z

X
n
_
1
_
Z

u
n
_
p

zx
<
p
0
using the rank and ortogonality conditions.
Asymptotic normality:
_
(

IV

0
)
d
N(0,
1
zx
S
1
zx
).
Finish both proofs as a useful homework
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Sources of valid instruments
Instrument Validity
IV validity 1): rank. E(z
i
x

i
) =
zx
, nite and invertible.
Intuitively this requires the instruments to be correlated with
the variables to be instrumented. This might be checked
empirically. More later.
IV validity 2): orthogonality. E(z
ik
u
i
) = 0. Instruments must
be uncorrelated with whatever is not observed that is a
determinant of y
i
. This depends on things we do not observe
and on how we setup the model.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Where do valid instruments came from?
Complex question. It is an econometric and a modeling issue.
Instrument for prices in the demand function: price
determinants related to the supply side (costs).
Angrist and Krueger (1991): wages as a function of
education. Education endogenous. Instrument: month of
birth!: individuals born in the rst months of the year may
abandon school earlier, then they should have less education
than the rest. Validity?
Growth regression: What is truly exogenous for
growth?(Durlauf, 2001).
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
IV: the Overidentied Case
Suppose now that there are p > K instruments. Then the MM
logic implies that
1
n
n

i=1
z
i
(y
i
x

IV
) = 0
is a system of p linear equations with K unknowns.
If we only care about consistency there is something obvious
we can do.
Less obvious?
We will explore a consistent and hopefully more ecient
strategy.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Model as before. z
i
is a vector of p > K instruments.
1
Linearity: y
i
= x

0
+u
i
i = 1, . . . , n.
2
Random sample: {x
i
, z
i
, u
i
} is a jointly i.i.d. process.
3
IV validity 1): rank. E(z
i
x

i
) =
zx
is a p K matrix, that
exists, is nite, an has full column rank.
4
IV validity 2): orthogonality. E(z
i
u
i
) = 0 for all i.
5
V (z
i
u
i
) = S nite positive denite.
6
Weighting matrix: There exists a sequence of possibly random
p p symmetric pd matrices W
n
, such that W
n
p
W, W is
also symmetric pd.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Estimation strategy: GMM
As mentioned before, we cannot force the sample moments
1
n
n

i=1
z
i
(y
i
x

) = 0
to hold. Let us adopt the following notation:
g
i
() = z
i
u
i
g
n
(b) = 1/n

n
i=1
z
i
(y
i
x

i
b)
So the sample moment condition is g
n
= 0.
GMM approach: choose b so g
n
(b) is as small as possible.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Formally, let J(b) n g
n
(b)

W
n
g
n
(b). The GMM estimator is
dened as follows:

g
argmin
b
J(b)

g
minimizes a weighted sum of the elements of the sample
moment condition.
Ill invite you to check that:

g
=
_
X

ZW
n
Z

X
_
1
X

ZW
n
Z

Y
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Comments:
The existence of

T
requires
_
X

ZW
n
Z

X
_
invertible. Check
that our assumptions guarantee this asymptotically.(Careful
what you do with n!).
When p = K this reduces to

g
=

IV
= (Z

X)
1
Z

Y . W
T
plays no role.
What is the value of J(

g
) in this latter case?. This will be
crucial later on.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
The plan
1
For any general choice of W
n
we will show consistency, AN
and derive a generic estimate for its AVAR.
2
How to choose W
n
optimally.
3
Many things simplify if we can further assume conditional
homoskedasticity: discuss particular simplications under this
case.
4
Revise the classical approaches (2SLS, etc.) from the GMM
perspective. Discuss small sample performance.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Asymptotic Properties of

g
Prelude: Identication
Before showing consistency we need to guarantee that the
estimation problem is well dened
In our case, it means that the GMM problem has a unique
solution in the population
Consider the moment equations
E(z
i
u
i
) = E
_
z
i
(y
i
x

i
b)

= 0
which are satistied with b =
0
.
Identication:
0
is identied if it is the only solution.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Write the moment conditions as:
E(z
i
y
i
) E(z
i
x
i
)
0
= 0

xz

0
=
xy
Result (Searle (1982), Simon and Blume (1994, pp. 148): if there
exists a solution to Ax = b, a necessary and sucient solution that
it is the only solution is that A is of full column rank
Consequently, the rank condition guarantees identication.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Consistency:

g
p

0
Start with:

g
=
_
X

ZW
n
Z

X
_
1
X

ZW
n
Z

Y
and replace Y = X
0
+u to get:

g
=
0
+
_
X

Z W
n
Z

X
_
1
X

Z W
n
Z

u
divide and multiply by n to get:

g
=
0
+
_
X

Z
n
W
n
Z

X
n
_
1
X

Z
n
W
n
Z

u
n
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
There are 6 terms. We know that W
n
p
W. Note that by the
LLN element by element:
1
n
Z

X =
1
n
n

i=1
z
i
x

i
p
E[z
i
x
i
] =
zx
1
n
Z

u =
1
n
n

i=1
z
i
u
i
p
E[z
i
u
i
]
Note that E[x
i
z

i
]WE[z
i
x

i
] is an invertible K K matrix.
Call H
_
E[x
i
z

i
]WE[z
i
x

i
]
_
1
E[x
i
z

i
] W. Then,

g
p

0
+H E[z
i
u
i
] =
0
by the moment condition and continuity.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Asymptotic normality: Substract
0
and multiply by

n to get:

n (

0
) =
_
X

Z
n
W
n
Z

X
n
_
1
X

Z
n
W
n
. .
p
H
Z

n
Since z
i
u
i
is an iid process with zero mean (by the population
condition) and nite variance, by the CLT:
Z

n
=

n
1
n
n

i=1
z
i
u
i
d
N(0, S)
Then, by Slutzkys theorem:

n (


0
)
d
N(0, HSH

).
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Variance estimator: The asymptotic variance of

g
is
AV(

g
) = HSH

=
_
(

xz
W
xz
)
1

xz
W

S
_
(

xz
W
xz
)
1

xz
W

To establish consistency of variance estimation we need to bound


the fourth moments:
Assumption: E[(x
ik
z
jl
)
2
] exist and are nite, for all k = 1, . . . , K,
l = 1, . . . , p.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
AV(

g
) = HSH

=
_
(

xz
W
xz
)
1

xz
W

S
_
(

xz
W
xz
)
1

xz
W

A consistent estimator can be obtained if we replace

xz
by

xz

1
n

n
i=1
z
i
x

i
,
W by W
n
, and
S = E(u
2
i
z
i
z

i
) by

S =
1
n

n
i=1
e
2
i
z
i
z

i
, with e
i
y
i
x

,
where

is any consistent estimator of
0
.
Then our consistent estimator will be:

AV (

g
) =
_
(

xz
W
n

xz
)
1

xz
W
n
_

S
_
(

xz
W
n

xz
)
1

xz
W
n
_

See the close relationship with the White estimator we have seen before.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Ecient GMM
Our GMM estimator is consistent and AN for any choice of
W
n
and its limit W.
Recall that the asymptotic variance of the GMM estimator is
AV(

g
) = HSH

The optimal weighting matrix, W


0
is the value that minimizes
AV(

g
).
Result: W
0
S
1
.
In such case, replacing, W = S
1
in AV(

g
) leads to:
AV
o
(

g
) =
_

xz
S
1

xz
_
1
which provides a lower bound for the AVAR of the generic GMM
estimator.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Our generic GMM estimator was

g
=
_
X

ZW
n
Z

X
_
1
X

ZW
n
Z

Y
Then the optimal GMM estimator can be constructed by replacing
W
n
by any consistent estimate of S
1
, up to scale, say, the inverse
of

S =
1
n

n
i=1
e
2
i
z
i
z

i
. Then:

0
g
=
_
X

Z

S
1
Z

X
_
1
X

Z

S
1
Z

Y
This suggests a two step procedure:
1
Choose any matrix W
n
in order to produce a consistent
estimate of
0
. With this we can obtain

S =
1
n

n
i=1
e
2
i
z
i
z
i
.
2
Compute the ecient GMM estimator replacing W
n
by

S
1
obtained before.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Proof of ecient GMM: We need to show AV(

g
) AV
o
(

g
) is a
positive semidenite matrix. Remember that it is equivalent to
show
AV
o1
(

g
) AV
1
(

g
)
is positive semidenite. Now
AV(

g
) =
_
(

xz
W
xz
)
1

xz
W

S
_
(

xz
W
xz
)
1

xz
W

AV
o
(

g
) =
_

xz
S
1

xz
_
1
So we need to show

xz
S
1

xz

xz
W
xz
_

xz
WSW
xz
_
1

xz
W
xz
is psd.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS

xz
S
1

xz

xz
W
xz
_

xz
WSW
xz
_
1

xz
W
xz

xz
_
S
1
W
xz
_

xz
WSW
xz
_
1

xz
W
_

xz

xz
_
P

P W
xz
_

xz
W(P

P)
1
W
xz
_
1

xz
W
_

xz
where P

P = S
1
, P invertible.

xz
P

_
I P
1
W
xz
_

xz
WP
1
P
1
W
xz
_
1

xz
WP
1
_
P
xz

xz
P

_
I B(B

B)
1
B

P
xz
for B P
1
W
xz
.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
We can write the previous result as:

xz
P

M
B
P
xz
with M
B
I B(B

B)
1
B

. Now, for any vector c


c

xz
P

M
B
P
xz
c = c

xz
P

M
B
M

B
P
xz
c
= q

q (q M

B
P
xz
c)
=

q
2
j
0
since q is a vector.

Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Simplications under Conditional Homoskedasticity
Suppose we can further assume conditional homoskedasticity:
Assumption: E(u
i
|z
i
) =
2
First, it is easy to verify the following
Result: Let e
i
= y
i
x

where

is any consistent estimator of

0
. Then
1
n

n
i=1
e
2
i
p

2
.
Proof: Start with e
i
= y
i
x

i

= u
i
x

i
(


0
). Then take squares, sum, distribute and use the
assumptions.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Now note that S = V (z
i
u
i
) = E(u
2
i
z
i
z

i
) simplies (using LIE) to:
S = E(u
2
i
z
i
z

i
) =
2
E(z
i
z

i
) =
2

z
which can be consistently estimated by

S =
2
1
n
n

i=1
z
i
z

i
=
2
1
n
Z

Z
Now replace in the formula for our optimal GMM estimator

0
g
=
_
X

Z

S
1
Z

X
_
1
X

Z

S
1
Z

Y
=
_
X

Z(Z

Z)
1
Z

X
_
1
X

Z(Z

Z)
1
Z

TSLS
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS

TSLS
=
_
X

Z(Z

Z)
1
Z

X
_
1
X

Z(Z

Z)
1
Z

Y is the Two-stage
least squares estimator.
Its asymptotic variance is
AV (

TSLS
) =
2
_

xz

1
zz

xz
_
1
that can be estimated consistently by:

AV (

TSLS
) =
2
_

xz

1
zz

xz
_
1
where we can use

TSLS
as the consistent estimate for
0
needed
to produce
2
.
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
Remarks on TSLS
The TSLS is a very particular case of a GMM estimator for a
particular choice of W
0
that arises under conditional
homoskedasticity.
You have to be very careful with the eciency assessments
that you want to make about TSLS. From the GMM
perspective it is not more ecient that other choices of W
0
that satises the optimality criterion. Why?, Can you quickly
produce another one?.
Careful with the steps. From the GMM perspective there is
only one step since we do not need to produce an estimate of

2
to implement the estimator (it cancelled out!!!).
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
TSLS as an IV estimator
In the standard literature it is called a two-stage estimator
because P
z
Z(Z

Z)
1
Z

is idempotent (P

z
P
z
= P
z
) and
symmetric, so:

TSLS
=
_
X

Z(Z

Z)
1
Z

X
_
1
X

Z(Z

Z)
1
Z

Y
= (X

P
z
X)
1
X

P
z
Y
= (X

z
P
z
X)
1
X

z
P
z
Y
= (X

)
1
X

with X

P
z
X and Y

P
z
Y .
Walter Sosa-Escudero Instrumental Variables
Endogeneities
IV under Exact Identication
IV: the Overidentied Case
Ecient GMM
Conditional Homoskedasticity and 2SLS
TSLS is actually and OLS estimator where in a rst step the
variables X

and Y

are constructed, and in a second step,


the OLS estimator is produced with these transformed
variables.
The rst stage removes the endogeneity problem by replacing
X by its linear projection on the space spanned by the
instruments Z, which are by construction orthogonal to the
error term.
Walter Sosa-Escudero Instrumental Variables

También podría gustarte