Some Basics For Panel Data Analysis

Some Basics for Panel Data Analysis
Last Chapter
gave only brief introduction and overview
refer to lecture Panel Data Econometrics
by Prof. Inmaculada Martinez-Zarzoso
Econometrics 2 (Summer 2008) 1 / 21
Introduction and motivation
Panel data (or longitudinal data) refers to a cross-section repeatedly
sampled over time, but where the same economic agent has been
followed throughout the period of the sample.
Examples:
rm or company data,
longitudinal data on patterns of individual behavior over the lifecycle,
comparative country-specic macroeconomic data over time.
Common feature:
the sample of individuals N is typically relative large,
the number of time periods T is generally short.
Why using panel estimation methods? Able to answer questions that
neither crosssection methods nor pure time series can address.
Greene (1991): we observe 50 per cent of a cohort of women to work.
Two possible interpretations:
50 per cent of women work on average each period, or
the same 50 per cent of women may work each period.
Dierent interpretations, dierent implications for policy.
There are nevertheless diculties inherent in data sources with a
longitudinal element:
attrition,
nonrandomness of the sample.
Why use panel data methods?
increased precision of regression estimates,
the ability to control for individual xed eects,
the ability to model temporal eects without aggregation bias.
Fixed eects panel data models
Let us consider
y
it
=
i
+ x
it
+ u
it
(1)
for i = 1, . . . , N individuals over t = 1, . . . , T time periods.
Model includes:
an individual eect
i
(constant over time),
marginal eects for x
it
(common across i and t).
The pooled Ordinary Least Squares (OLS) estimator
The simplest approach to the estimation,
individual eects
i
are xed and common across economic agents,
such that
i
= for all i = 1, . . . , N.
OLS produces consistent and ecient estimates of and .
= y

x
=
_
1
NT
N
i =1
T
t=1
x
it
y
it
_
/
_
1
NT
N
i =1
T
t=1
x
2
it
_
(2)
where x = (1/NT)
N
i =1
T
t=1
x
it
and x
it
= x
it
x, similarly for y.
Notice var (
) = var (u
it
)
_
N
i =1
T
t=1
x
2
it
_
1
The pooled Ordinary Least Squares (OLS) estimator
The WithinGroups (WG) estimator
Can be used if individual eects
i
are xed but not common across
i = 1, . . . , N.
Eliminates the xed eect
i
by dierencing.
Let y
i
= T
1
T
t=1
y
it
and x
i
= T
1
T
t=1
x
it
.
Dene x
it
= x
it
x
i
and y
it
= y
it
y
i
Then y
i
=
i
+ x
i
+ u
i
.
Subtracting from (1) gives
y
it
y
i
= (
i

i
) + (x
it
x
i
)
+ (u
it
u
i
)
or y
it
= (x
it
)
+ u
it
Hence,
WG
=
_
1
NT
N
i =1
T
t=1
x
it
y
it
_
/
_
1
NT
N
i =1
T
t=1
x
2
it
_
(3)
Dene S
xx
=
N
i =1
T
t=1
(x
it
x)
2
,
S
w
xx
=
N
i =1
T
t=1
(x
it
x
i
)
2
, S
b
xx
=
N
i =1
T( x
i
x)
2
.
Can show that S
xx
= S
w
xx
+ S
b
xx
.
Given that var (u
it
) =
T1
T
var (u
it
), we have
var (
WG
) =
var (u
it
)
N
i =1
T
t=1
x
2
it
=
var (u
it
)
S
xx
S
b
xx
=
T1
T
var (u
it
)
S
xx
S
b
xx
.
Drawback with the WithinGroups estimator: eliminates
timeinvariant characteristics from a model of the form
y
it
=
i
+ x
it
+ z
i
+ u
it
.
The Least Squares Dummy Variable (LSDV) estimator
Dene a series of groupspecic dummy variables d
git
= 1I
(g=i )
.
This gives
y
it
=
i
+ x
it
+ u
it
=
1
d
1it
+
2
d
2it
+ +
N
d
Nit
+ x
it
+ u
it
Estimate by standard OLS yielding

LSDCV
.
A test for individual eects: Under the null,
1
=
2
= =
N
.
Test using subset-F statistic
F =
R
2
DV
R
2
p
1 R
2
DV
NT N k
N 1
.
Distributed F
N1, NTNk
under the null of equality of
i
.
The TwoWay Fixed Eects Model
Let
y
it
= +
i
+
t
+ x
it
+ u
it
,
where
t
represents the (xed) time eects,
N
i =1

i
=
T
t=1

t
= 0
Include time dummies z
sit
= 1I
(s=t)
to give
y
it
=
1
d
1it
+
2
d
2it
+ +
N
d
Nit
+g
2
z
2it
+ +g
T
z
Tit
+x
it
+u
it
.
Simple OLS of y
it
= y
it
y
t
y
i
+

y on x
it
= x
it
x
t
x
i
+

x
gives consistent estimates; further,
=

y
x
T

i
= ( y
i

y) ( x
i

x)
T

t
= ( y
t

y) ( x
t

x)
T

Test on no time eects works as above.

Random Eects Model
When time or cross section information is short, xed eects are not
reasonable (too small sample in that direction).
Instead, one assumes individual or time eect to be random.
E.g. without time eect but u
i
random
y
it
= x
T
it
+ + u
i
+
it
, where
E[
it
|X] = 0, E[
2
it
|X] =
2
e
, E[u
i
|X] = 0, E[u
2
i
|X] =
2
u
,
E[
it
u
j
|X] = 0 for all i , j , t, E[
it
js
|X] = 0 for all t = s or i = j ,
E[u
i
u
j
|X] = 0 for all i = j .
Then, the variance of Y
i
conditioned on X
i
is
=
_
_
_
_
_
2
e
+
2
u

2
u

2
u
. . .
2
u
2
u

2
e
+
2
u

2
u
. . .
2
u
.
.
.
.
.
.
.
.
.
2
u

2
u
. . .
2
u

2
e
+
2
u
_
_
_
_
_
Random Eects Model
The variance over all Y|X is = I
N
.
Then, the GLS is
= (X
T
1
X)
1
X
T
1
Y
what corresponds to an OLS after a linear data transformation...
Feasible GLS:
rst estimate
e
,
u
consistently (dierent possibilities in linear
models as above), afterwards calculate

with

.
Inference, Prediction, etc.
Note that the feasible GLS estimate has an increased variance.
There exist linear approximations for the exact variance
and, more recently, bootstrap approximations.
For prediction, predicted random eects are added. Then, for
construction of prediction intervals similar problems occur.
BreuschPagan Lagrange Multiplier Test
Test random eects model against pooled OLS model
Hypothesis under investigation is H
0
:
u
= 0 vs H
1
:
u
= 0
Note that under H
0
reduces to pooled OLS regression.
LM =
NT
2(T 1)
_
N
i =1
_
T
t=1
e
it
_
2
N
i =1
T
t=1
e
2
it
1
_
_
2
Discuss, why this is a reasonable statistic!
Under null, statistic distributed as
2
1
Note that here T is not supposed to go to innity
(not necessary for asymptotic statistics, need N 0 instead)
Summarize Models
Let
y
it
= x
T
it
+ z
T
i
+
it
where x must not contain constant. z
T
i
individual eect,
i.e. heterogeneity.
Pooled Regression: If z
i
contains only a constant term. Then, OLS
provides consistent and ecient estimates (under homoscedasticity)
for common scalar and vector .
Fixed Eects: (1) If z
i
is unobserved but correlated with x
it
. Then
OLS is inconsistent. Consider z
T
i
=
i
.
(2) can include time xed eect
t
(two way xed eects)
Random Eects: If the unobserved heterogeneity is uncorrelated
with x
it
and independent of
it
.
Random Parameters: y
it
= x
T
it
( + h
i
) + + u
i
+
it
,
where h
i
, u
i
are random eects. (not treated here)
The Hausman Test
First, discuss which model has to be used?
In presence of correlation between regressors x
it
and individual
(random) eects u
i
the GLS estimator is inconsistent, whilst the OLS
estimates (Least squares dummy variable estimates) are consistent.
If x
it
, u
i
are uncorrelated, GLS estimator is consistent and ecient
whilst the OLS estimator is consistent but inecient.
Construct a test based on the dierence between GLS and LSDV-OLS
_
GLS

OLS
_
CoVar
1
_
GLS

OLS
_
H
0

2
dim()
,
Hausman found that covariance of an ecient estimator with its
dierence from an inecient estimator is zero. It follows
Cov[
GLS
,

OLS
] = Var [
GLS
] and so Covar =
Var [
GLS
] + Var [
OLS
] 2Cov[
GLS
,

OLS
] = Var [
OLS
] Var [
GLS
]
Note on Unbalanced Panels
Missing data are very common in panel data sets.
Therefore, panel data sets in which the group sizes dier across
groups are not unusual.
These panels are called unbalanced panels
The methods shown here can be applied to those data with minor
modications (mainly notation).
Let us consider when T possibly varies over i .
NT becomes now
N
i =1
T
i
with possibly dierent T
i
E.g.

x =
N
i =1
T
i
t=1
x
it
N
i =1
T
i
etc.
Adaptation of statistics is usually straight forward
Note on Autocorrelation
For ease of notation let us consider y
it
= x
T
it
+
it
.
Assume
it
=
i
i ,t1
+ u
it
and
Var [
i ,t1
] =
2
i
=
2
ui
(1
2
i
)
1
Then,
y
it
y
i ,t1
= (x
it

i
x
i ,t1
)
T
+ u
it
and the covariance structure for each individual is
2
i
=

2
ui
1
2
i
_
_
_
_
_
_
_
1
i

2
i

T1
i
i
1
i

T2
i
2
i

3
i
1
T3
i
.
.
.
T1
i

T2
i

T3
i
1
_
_
_
_
_
_
_
and, as individuals are independent, CoVar [Y|X] = I
N
2
i =1,...,N

With feasible GLS we get an asymptotically ecient estimator for
with the use of . Note that under homoskedasticity,
2
i
is not
necessary.
Note that this corresponds to data transformation
y
i
=
_
_
_
_
_
_
_
_
_
1
2
i
y
i 1
y
i 2

i
y
i 1
y
i 3

i
y
i 2
.
.
.
y
iT

i
y
i ,T1
_
_
_
_
_
_
_
_
, x
i
=
_
_
_
_
_
_
_
_
_
1
2
i
x
i 1
x
i 2

i
x
i 1
x
i 3

i
x
i 2
.
.
.
x
iT

i
x
i ,T1
_
_
_
_
_
_
_
_
One can get
ui
= T
1
(y
i
x
i
T

)
T
(y
i
x
i
T

)
With any consistent estimate

calculate residuals
it
, then dene

i
=
_
T
t=2

it

i ,t1
_
/
_
T
t=1

2
it
_
If now
i
= for all i , then estimates are less variable.
Discuss dierences...
Further, one could allow for cross-section correlation across units.
Then Cov[u
it
, u
jt
] =
uij
and we have for the o-diagonal blocks of
the conditional Covar [Y|X]
ij
ij
=

uij
1
i
j
_
_
_
_
_
_
_
_
1
j

2
j

T1
j
i
1
j

T2
j
2
i

3
i
1
T3
j
.
.
.
T1
i

T2
i

T3
i
1
_
_
_
_
_
_
_
_
But note that this requires large samples to get still reasonable
degrees of freedom and reach robust estimates for slopes!

Some Basics For Panel Data Analysis

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Some Basics For Panel Data Analysis

Cargado por

Copyright:

Formatos disponibles

Some Basics for Panel Data Analysis

Test on no time eects works as above.

Econometrics 2 (Summer 2008) 19 / 21

También podría gustarte