Está en la página 1de 10

Generalized Least Squares

(Handout Version)∗

Walter Belluzzo

Econ 507 Econometric Analysis


Spring 2013

1 Introduction
Efficiency of the OLS Estimator

• Remember that the OLS estimator efficient (best linear unbiased estimator) if the DGP
belongs to the regression model

y = Xβ + u, u|X ∼ iid(0, σ 2 I),

a result stated in the Gauss-Markov theorem.

• For efficiency of least squares, the error terms must be uncorrelated and have the equal
variance, Var(u) = σ 2 I.

• The usual estimators of the covariance matrices of the OLS and NLS estimators are not
valid when these assumptions do not hold.

• Alternative “sandwich” covariance matrix estimators that are asymptotically valid can be
obtained. But inefficiency of the estimators β̂ remains.

Regression Model with Non-spherical Disturbances

• Non-spherical disturbances affect both linear and nonlinear regression models in the same
way. So, we can focus our attention to the simpler, linear case.

• Let us consider the model

y = Xβ + u, E(uu0 ) = Ω.

• The idea to obtain an efficient estimator for the vector β in this model is to find a
transformation that makes the Gauss-Markov conditions to be satisfied.

• The resulting efficient estimator (why?) is called the generalized least squares, or
GLS, estimator.
∗ This lecture is based on D&M Chapter 6.
Econ 507 – Spring 2013

2 Generalized Least Squares


• The transformation we want to find must be such that the new. transformed, error terms
have variance matrix Var(u) = σ 2 I.

• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed


error vector Ψ 0 u is

E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ

= Ψ 0Ω Ψ ,

• To make the expression in the farther right-hand side to reduce to σ 2 I, we must define Ψ
such that
Ω −1 = Ψ Ψ 0 .

Transforming Back to Classic Regression

• In this case, the variance of the transformed error reduces to

E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ


= Ψ 0 (Ψ 0 )−1 Ψ −1 Ψ = I.

• Premultiplying the regression by Ψ 0 gives

Ψ 0 y = Ψ 0 Xβ + Ψ 0 u.

• Because the covariance matrix Ω is nonsingular, the matrix Ψ must be as well, and so
the transformed regression model is perfectly equivalent to the original model.

GLS of the Transformed Model

• The OLS estimator of β from the transformed regression is

β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y

= (X0 Ω −1 X)−1 X0 Ω −1 y.

• This is the expression for the generalized least squares, estimator of β.

• Since β̂gls is just the OLS estimator for the transformed model, its covariance matrix can
−1
be found directly from the OLS covariance matrix, σ 2 X0 X .

• Replacing X by Ψ 0 X and σ02 by 1 we get

Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .

2
Econ 507 – Spring 2013

The GLS Criterion Function

• The generalized least squares estimator β̂gls can also be obtained by minimizing the GLS
criterion function
(y − Xβ)0 Ω −1 (y − Xβ),
which is just the sum of squared residuals from the transformed regression.

• This can be viewed as the SSR function from the original model, weighted by the inverse
of the matrix Ω.

• The effect of such a weighting scheme is clearest when Ω is a diagonal matrix. In that
case, the weight given to the tth observation is proportional to the inverse of Var(ut ).

3 Efficiency of the GLS Estimator


Method of Moments Representation of GLS

• The GLS estimator β̂gls defined in (7.04) is also the solution of the set of moment condi-
tions
X0 Ω −1 (y − X β̂gls ) = 0.
which the same old with W = Ω −1 X.

• It is easy to verify that these moment conditions are equivalent to the first-order conditions
for the minimization of the GLS criterion function (do it as an exercise).

• Since the GLS estimator is a method of moments estimator, it is interesting to compare


it with estimators obtained with a general matrix W, denoted β̂w .

• We will obtain efficiency from this comparison.

Method of Moments Representation of GLS

• Suppose that the DGP is a special case of that model, with parameter vector β0 and
known covariance matrix Ω.

• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,


pre-determinedness would suffice.

• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that

β̂w = β0 + (W0 X)−1 W0 u.

• Therefore, the covariance matrix of β̂w is


 
Var(β̂w ) = E (β̂w − β0 )(β̂w − β0 )0
= E (W0 X)−1 W0 uu0 W(X0 W)−1


= (W0 X)−1 W0 ΩW(X0 W)−1 .

3
Econ 507 – Spring 2013

Efficiency of the GLS Estimator

• To show efficiency of β̂gls , we proceed as in previous cases and show that the difference
of the precision matrices,

X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)

is positive semidefinite (Do it as an exercise).

• This difference being positive semidefinite means that any other choice of variables W
yields larger variance than W = X0 Ω −1 .

• In fact, β̂gls is typically more efficient for all elements of β, because it is only in very
special cases that the matrix (1) will have any zero diagonal elements.

• Note that β̂w reduces to the OLS estimator when W = X. Thus we conclude that our
conclusions apply to the OLS estimator, β̂.

4 Computing GLS Estimates


• The main issue in computing the GLS estimator is that, in general, the matrix Ω in
unknown. But it is important to note that there is a computational difficulty even if Ω
is known.

• The reason is that when n is large, computation based on Ω, which is an n × n matrix,


can be very demanding in terms of computer memory.

• In general, computation of the GLS estimator will be easy only if the matrix Ψ has a
form that allows us to calculate Ψ 0 x, without having to store Ψ itself in memory.

GLS with Ω Known Up to a Constant

• Suppose that Ω = σ 2 ∆ , where the n × n matrix ∆ is known to the investigator, but the
positive scalar σ 2 is unknown.

• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,


but the error terms will now have variance σ 2 instead of variance 1.

• The OLS estimates from the transformed regression with the modified Ψ is numerically
identical to β̂gls :

(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y

= (X0 Ω −1 X)−1 X0 Ω −1 y

= β̂gls .

• Thus the GLS estimates will be the same whether we use Ω or ∆, that is, whether or not
we know σ 2 .

4
Econ 507 – Spring 2013

• The covariance matrix of β̂gls in this case can be written as

Var(β̂gls ) = σ 2 (X∆X),

which can be estimated by replacing σ 2 with the usual estimator OLS of the error variance,
s2 , from the transformed regression.

Weighted Least Squares


• Let ωt2 denote the tth diagonal element of Ω. That is, the error terms are heteroskedastic
but uncorrelated.

• Then Ω −1 is a diagonal matrix with tth diagonal element ω −2 , and thus Ψ will be a
diagonal matrix with elements ωt−1 .

• In this case, the transformed regression can be written as


1 1 1
yt = Xt β + ut ,
ωt ωt ωt
and estimated by OLS.

• This special case of GLS estimation is often called weighted least squares, or WLS.

• The weight given to each observation is ω −1 , and thus observations for which the variance
of the t error term is large/small are given low/high weights.

• Note that all the variables in the regression, including the constant term, must be multi-
plied by the same weights.

• Note that the R2 only makes sense in terms of the transformed regressand, since the
“undoing” the weighting does not preserve orthogonality of residuals and fitted values.
That is,
û ⊥ ŷ =⇒
6 Ψ −1 û ⊥ Ψ −1 ŷ

Generalized Nonlinear Least Squares


• Replacing the vector of regression functions Xβ by x(β), we obtain generalized non-
linear least squares, or GNLS, estimates by minimizing the criterion function
0
(y − x(β)) Ω −1 (y − x(β)) ,

• Differentiating with respect to β and dividing by −2 yields the moment conditions

X 0 (β)Ω −1 (y − x(β)) = 0,

where, X(β) is the matrix of derivatives of x(β) with respect to β.

5
Econ 507 – Spring 2013

5 Feasible Generalized Least Squares


GLS is Infeasible in Practice

• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is


expensive because there is a n × n matrix to be handled.

• Life is much easier if there is heteroskedasticity and no serial correlation. In this case, we
can simply use weighted least squares.

• But even in this case some information on ωt is still necessary, such as sampling design
or a direct relationship between E(u2t ) and some variable zt that can be used as weight.

• In practice, the covariance matrix Ω is often not known even up to a scalar factor. This
makes it impossible to compute GLS estimates.

Estimating the Variance Matrix Ω

• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a


vector of unknown parameters γ, that is, assume that Ω = Ω(γ).

• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is


consistent for Ω.

• Then we can define Ψ (γ̂) such that

Ω̂ = Ψ (γ̂)Ψ 0 (γ̂).

and obtain GLS estimates conditional on Ψ (γ̂).

• The resulting estimator is called feasible generalized least squares, or feasible GLS

Estimating Ω Using Skedastic Functions

• In the same way that a regression function determines the conditional mean of a random
variable, a skedastic function determines its conditional variance:

E(u2t |xt , zt ) = h(zt ; γ),

where γ is an l-vector of unknown parameters, and zt is a vector of observations on


exogenous or predetermined variables that belong to the information set on which we are
conditioning.

• An example of a skedastic function is exp(Zt γ), which conveniently produces positive


estimated variances for all γ.

6
Econ 507 – Spring 2013

Example of Feasible GLS Procedure

• Consider the linear regression model

yt = xt β + ut , E(u2t ) = exp(zt γ).

• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates


of the error terms from the vector of OLS residuals with typical element ût .

• We can then obtain OLS estimates γ̂ running the auxiliary linear regression

log û2t = Zt γ + vt ,

• These estimates are then used to compute


 1/2
ω̂t = exp(Zt γ̂)

for all t.

• Finally, feasible GLS estimates of β are obtained by using ordinary least squares to esti-
mate regression, with the estimates ω̂t replacing the unknown ωt ,
1 1 1
yt = Xt β + ut .
ω̂t ω̂t ω̂t

• This is an example of feasible weighted least squares.

• Under suitable regularity conditions, it can be shown that this type of procedure yields
a feasible GLS estimator β̂f that is consistent and asymptotically equivalent to the GLS
estimator β̂gls .

Why Feasible GLS Works


Consistency of the GLS Estimator

• If we substitute Xβ0 + u for y into the formula for the GLS estimator, we find that

β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.

• Taking probability limits, after rearranging multiplying each factor by an appropriate


power of n, we get
−1 

 
a 1
n(β̂gls − β0 ) = plim X0 Ω −1 X plim n −1/2 0
XΩ −1
u .
n

• As usual, we assume sufficient conditions for the first factor in the right-hand side to tend
to a non-stochastic k × k matrix.

• Then, we apply a CLT to the second factor to conclude that it is a asymptotically normal
random vector, and thus obtain root-n consistency and normality.

7
Econ 507 – Spring 2013

• Following the same argument for the feasible GLS estimator, we find that
−1 

 
a 1 0 −1 −1/2 0 −1
n(β̂f − β0 ) = plim X Ω (γ̂)X plim n X Ω (γ̂)u .
n

• Clearly, β̂gls will be asymptotically equivalent to β̂f if


1 0 −1 1
plim X Ω (γ̂)X = plim X0 Ω −1 X
n n
and
plim n− /2 X0 Ω −1 (γ̂)u = plim n− /2 X0 Ω −1 u.
1 1

• For these equalities to hold, it is necessary that plim γ̂ = γ.

Small Sample Properties of the Feasible GLS


• Whether or not feasible GLS is a desirable estimation method in practice depends on how
good an estimate of Ω can be obtained.

• If Ω(γ̂) is a very good estimate, then feasible GLS will have essentially the same properties
as GLS itself.

• As a result, inferences should be reasonably reliable, even though they will not be exact
in finite samples.

• On the other hand, if Ω(γ̂) is a poor estimate, feasible GLS estimates may have quite
different properties from real (infeasible) GLS estimates, and inferences may be quite
misleading.

Alternative Estimation Approaches


• It is possible to iterate a feasible GLS procedure, using β̂f to compute new set of residuals,
ˆ.

ˆ to obtain a second-round estimate of γ̂


• Then, use û ˆ , which can be used to calculate
ˆ
second-round feasible GLS estimates, β̂f , and so on.

• This procedure can either be stopped after a predetermined number of rounds or continued
until convergence is achieved (although convergence is not guaranteed).

• Iteration does not change the asymptotic distribution of the feasible GLS estimator, but
it does change its finite-sample distribution.
• Another way to estimate models in which the covariance matrix of the error terms depends
on one or more unknown parameters is to use the method of maximum likelihood.

• As we will see later on, in this case, β and γ are estimated jointly and consistency will
follow if the maximum likelihood regularity conditions are satisfied.

• In many cases, an iterated feasible GLS estimator will be the same as a maximum likeli-
hood estimator based on the assumption of normally distributed errors.

8
Econ 507 – Spring 2013

6 Testing for Heteroskedasticity


Model Specification and Heteroskedasticity

• It is important to note that in our usual setup, homoskedasticity is imposed as a assump-


tion in model specification.

• If the true DGP is heteroskedastic, it will not the included in the estimated model, and
therefore there is a specification error.

• The specification error does not bias the OLS estimator, but renders it inefficient, as the
sandwich form of its covariance matrix suggests.

• As we have seen, we can compute asymptotically valid covariance matrix estimates for
the (inefficient) OLS and NLS parameter estimates.

• So, what if we choose to assume heteroskedasticity and settle with a inefficient estimator,
but the true DGP is homoskedastic?

• Simulation experiments suggest that this specification error frequently has little cost.

• This evidence can be taken as an indication that it may be prudent to employ an HCCME
anyway, especially if the sample size is large.

• However, in finite samples, tests and confidence intervals based on HCCMEs will always
be somewhat less reliable than ones based on the usual OLS covariance matrix under
homoskedasticity.

• If we have information on the form of the skedastic function, we might well wish to use
feasible generalized least squares, which is asymptotically equal to the efficient generalized
least squares.

• However, small sample properties of the feasible generalized least squares depend critically
on the estimates Ω̂.

• So, if the true DGP is homoskedastic and we assume heteroskedastcity, we can expect
that the specification error may be costly in small samples.

• So, before deciding to use a HCCME or a Feasible GLS procedure, it is advisable to


perform a specification test of the null hypothesis that the error terms are homoskedastic.

Skedastic Function and Heteroskedasticity Testing

• Let us consider a reasonably general model of conditional heteroskedasticity, such as

E(u2t | Ωt ) = h(δ + zt γ),

where the skedastic function h( · ) is a nonlinear function that can take on only posi-
tive values, zt is a 1 × r vector of observations on exogenous or predetermined variables
that belong to the information set Ωt , δ is a scalar parameter, and γ is an r-vector of
parameters.

9
Econ 507 – Spring 2013

• Under the null hypothesis that γ = 0, the function h(δ+Zt γ) collapses to h(δ), a constant.

• If we think of the skedastic function as a regression equation in conditional expectation


form, then its error form can be written as

u2t = h(δ + zt γ) + vt .

• Alternatively, you can define vt as the difference between u2t and its conditional expecta-
tion, and rewrite the skedastic function as in the last expression.

• Suppose that we actually observe ut . Then, we can test γ = 0 using a Gauss-Newton


regression

u2t − h(δ + Zt γ) = h0 (δ + Zt γ)bδ + h0 (δ + Zt γ)Zt bγ + residual,

where h0 ( · ) is first derivative of h( · ), bδ is the coefficient of δ, and bγ is the r-vector of


coefficients associated with γ.

GNR Testing for Heteroskedasticity

• Remember that we need to evaluate the GNR at “initial” parameter values.

• So, let us evaluate it at γ = 0 and δ = δ̃ ≡ h−1 (σ̃ 2 ), where σ̃ 2 is the sample variance of
ut :
u2t − σ̃ 2 = h0 (δ̃)bδ + h0 (δ̃)Zt bγ + residual.

• For the purpose of testing the null hypothesis that γ = 0, this regression is equivalent to

u2t = bδ + Zt bγ + residual,

with a suitable redefinition of the artificial parameters bδ and bγ , which does not depend
on the functional form of h( · ).

Residuals and Heteroskedasticity Testing

• It can be shown that replacing u2t by û2t does not change the asymptotic distribution of
the F and nR2 statistics for testing the hypothesis bγ = 0;

• The last issue is to choose the variables to be included in Z. White suggests including
all squares and cross-products of the variables em X (why?), which results in the White
Test.

• The general form of the test is basically the Breush-Pagan Test. We will derive the
limiting distribution for this test later, in a more convenient framework.

• Since the asymptotic approximations for these test statistics may be inaccurate in finite-
samples, bootstrapping them when the sample size is small or moderate may be a good
idea.

10

También podría gustarte