GLS Handout

Generalized Least Squares
(Handout Version)∗
Walter Belluzzo
Econ 507 Econometric Analysis

Spring 2013
1 Introduction
Efficiency of the OLS Estimator
• Remember that the OLS estimator efficient (best linear unbiased estimator) if the DGP
belongs to the regression model
y = Xβ + u, u|X ∼ iid(0, σ 2 I),
a result stated in the Gauss-Markov theorem.
• For efficiency of least squares, the error terms must be uncorrelated and have the equal
variance, Var(u) = σ 2 I.
• The usual estimators of the covariance matrices of the OLS and NLS estimators are not
valid when these assumptions do not hold.
• Alternative “sandwich” covariance matrix estimators that are asymptotically valid can be
obtained. But inefficiency of the estimators β̂ remains.
Regression Model with Non-spherical Disturbances
• Non-spherical disturbances affect both linear and nonlinear regression models in the same
way. So, we can focus our attention to the simpler, linear case.
• Let us consider the model
y = Xβ + u, E(uu0 ) = Ω.
• The idea to obtain an efficient estimator for the vector β in this model is to find a
transformation that makes the Gauss-Markov conditions to be satisfied.
• The resulting efficient estimator (why?) is called the generalized least squares, or
GLS, estimator.
∗ This lecture is based on D&M Chapter 6.
Econ 507 – Spring 2013
2 Generalized Least Squares

• The transformation we want to find must be such that the new. transformed, error terms
have variance matrix Var(u) = σ 2 I.
• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed

error vector Ψ 0 u is
E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ
= Ψ 0Ω Ψ ,
• To make the expression in the farther right-hand side to reduce to σ 2 I, we must define Ψ
such that
Ω −1 = Ψ Ψ 0 .
Transforming Back to Classic Regression
• In this case, the variance of the transformed error reduces to
E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ

= Ψ 0 (Ψ 0 )−1 Ψ −1 Ψ = I.
• Premultiplying the regression by Ψ 0 gives
Ψ 0 y = Ψ 0 Xβ + Ψ 0 u.
• Because the covariance matrix Ω is nonsingular, the matrix Ψ must be as well, and so
the transformed regression model is perfectly equivalent to the original model.
GLS of the Transformed Model
• The OLS estimator of β from the transformed regression is
β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y
= (X0 Ω −1 X)−1 X0 Ω −1 y.
• This is the expression for the generalized least squares, estimator of β.
• Since β̂gls is just the OLS estimator for the transformed model, its covariance matrix can
−1
be found directly from the OLS covariance matrix, σ 2 X0 X .
• Replacing X by Ψ 0 X and σ02 by 1 we get
Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .
2
The GLS Criterion Function
• The generalized least squares estimator β̂gls can also be obtained by minimizing the GLS
criterion function
(y − Xβ)0 Ω −1 (y − Xβ),
which is just the sum of squared residuals from the transformed regression.
• This can be viewed as the SSR function from the original model, weighted by the inverse
of the matrix Ω.
• The effect of such a weighting scheme is clearest when Ω is a diagonal matrix. In that
case, the weight given to the tth observation is proportional to the inverse of Var(ut ).
3 Efficiency of the GLS Estimator

Method of Moments Representation of GLS
• The GLS estimator β̂gls defined in (7.04) is also the solution of the set of moment condi-
tions
X0 Ω −1 (y − X β̂gls ) = 0.
which the same old with W = Ω −1 X.
• It is easy to verify that these moment conditions are equivalent to the first-order conditions
for the minimization of the GLS criterion function (do it as an exercise).
• Since the GLS estimator is a method of moments estimator, it is interesting to compare

it with estimators obtained with a general matrix W, denoted β̂w .
• We will obtain efficiency from this comparison.
Method of Moments Representation of GLS
• Suppose that the DGP is a special case of that model, with parameter vector β0 and
known covariance matrix Ω.
• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,

pre-determinedness would suffice.
• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that
β̂w = β0 + (W0 X)−1 W0 u.
• Therefore, the covariance matrix of β̂w is

Var(β̂w ) = E (β̂w − β0 )(β̂w − β0 )0
= E (W0 X)−1 W0 uu0 W(X0 W)−1

= (W0 X)−1 W0 ΩW(X0 W)−1 .
3
Efficiency of the GLS Estimator
• To show efficiency of β̂gls , we proceed as in previous cases and show that the difference
of the precision matrices,
X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)
is positive semidefinite (Do it as an exercise).
• This difference being positive semidefinite means that any other choice of variables W
yields larger variance than W = X0 Ω −1 .
• In fact, β̂gls is typically more efficient for all elements of β, because it is only in very
special cases that the matrix (1) will have any zero diagonal elements.
• Note that β̂w reduces to the OLS estimator when W = X. Thus we conclude that our
conclusions apply to the OLS estimator, β̂.
4 Computing GLS Estimates

• The main issue in computing the GLS estimator is that, in general, the matrix Ω in
unknown. But it is important to note that there is a computational difficulty even if Ω
is known.
• The reason is that when n is large, computation based on Ω, which is an n × n matrix,

can be very demanding in terms of computer memory.
• In general, computation of the GLS estimator will be easy only if the matrix Ψ has a
form that allows us to calculate Ψ 0 x, without having to store Ψ itself in memory.
GLS with Ω Known Up to a Constant
• Suppose that Ω = σ 2 ∆ , where the n × n matrix ∆ is known to the investigator, but the
positive scalar σ 2 is unknown.
• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,

but the error terms will now have variance σ 2 instead of variance 1.
• The OLS estimates from the transformed regression with the modified Ψ is numerically
identical to β̂gls :
(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y
= (X0 Ω −1 X)−1 X0 Ω −1 y
= β̂gls .
• Thus the GLS estimates will be the same whether we use Ω or ∆, that is, whether or not
we know σ 2 .
4
• The covariance matrix of β̂gls in this case can be written as
Var(β̂gls ) = σ 2 (X∆X),
which can be estimated by replacing σ 2 with the usual estimator OLS of the error variance,
s2 , from the transformed regression.
Weighted Least Squares

• Let ωt2 denote the tth diagonal element of Ω. That is, the error terms are heteroskedastic
but uncorrelated.
• Then Ω −1 is a diagonal matrix with tth diagonal element ω −2 , and thus Ψ will be a
diagonal matrix with elements ωt−1 .
• In this case, the transformed regression can be written as

1 1 1
yt = Xt β + ut ,
ωt ωt ωt
and estimated by OLS.
• This special case of GLS estimation is often called weighted least squares, or WLS.
• The weight given to each observation is ω −1 , and thus observations for which the variance
of the t error term is large/small are given low/high weights.
• Note that all the variables in the regression, including the constant term, must be multi-
plied by the same weights.
• Note that the R2 only makes sense in terms of the transformed regressand, since the
“undoing” the weighting does not preserve orthogonality of residuals and fitted values.
That is,
û ⊥ ŷ =⇒
6 Ψ −1 û ⊥ Ψ −1 ŷ
Generalized Nonlinear Least Squares

• Replacing the vector of regression functions Xβ by x(β), we obtain generalized non-
linear least squares, or GNLS, estimates by minimizing the criterion function
0
(y − x(β)) Ω −1 (y − x(β)) ,
• Differentiating with respect to β and dividing by −2 yields the moment conditions
X 0 (β)Ω −1 (y − x(β)) = 0,
where, X(β) is the matrix of derivatives of x(β) with respect to β.
5
5 Feasible Generalized Least Squares

GLS is Infeasible in Practice
• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is

expensive because there is a n × n matrix to be handled.
• Life is much easier if there is heteroskedasticity and no serial correlation. In this case, we
can simply use weighted least squares.
• But even in this case some information on ωt is still necessary, such as sampling design
or a direct relationship between E(u2t ) and some variable zt that can be used as weight.
• In practice, the covariance matrix Ω is often not known even up to a scalar factor. This
makes it impossible to compute GLS estimates.
Estimating the Variance Matrix Ω
• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a

vector of unknown parameters γ, that is, assume that Ω = Ω(γ).
• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is

consistent for Ω.
• Then we can define Ψ (γ̂) such that
Ω̂ = Ψ (γ̂)Ψ 0 (γ̂).
and obtain GLS estimates conditional on Ψ (γ̂).
• The resulting estimator is called feasible generalized least squares, or feasible GLS
Estimating Ω Using Skedastic Functions
• In the same way that a regression function determines the conditional mean of a random
variable, a skedastic function determines its conditional variance:
E(u2t |xt , zt ) = h(zt ; γ),
where γ is an l-vector of unknown parameters, and zt is a vector of observations on

exogenous or predetermined variables that belong to the information set on which we are
conditioning.
• An example of a skedastic function is exp(Zt γ), which conveniently produces positive

estimated variances for all γ.
6
Example of Feasible GLS Procedure
• Consider the linear regression model
yt = xt β + ut , E(u2t ) = exp(zt γ).
• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates

of the error terms from the vector of OLS residuals with typical element ût .
• We can then obtain OLS estimates γ̂ running the auxiliary linear regression
log û2t = Zt γ + vt ,
• These estimates are then used to compute

1/2
ω̂t = exp(Zt γ̂)
for all t.
• Finally, feasible GLS estimates of β are obtained by using ordinary least squares to esti-
mate regression, with the estimates ω̂t replacing the unknown ωt ,
1 1 1
yt = Xt β + ut .
ω̂t ω̂t ω̂t
• This is an example of feasible weighted least squares.
• Under suitable regularity conditions, it can be shown that this type of procedure yields
a feasible GLS estimator β̂f that is consistent and asymptotically equivalent to the GLS
estimator β̂gls .
Why Feasible GLS Works

Consistency of the GLS Estimator
• If we substitute Xβ0 + u for y into the formula for the GLS estimator, we find that
β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.
• Taking probability limits, after rearranging multiplying each factor by an appropriate

power of n, we get
−1
√

a 1
n(β̂gls − β0 ) = plim X0 Ω −1 X plim n −1/2 0
XΩ −1
u .
n
• As usual, we assume sufficient conditions for the first factor in the right-hand side to tend
to a non-stochastic k × k matrix.
• Then, we apply a CLT to the second factor to conclude that it is a asymptotically normal
random vector, and thus obtain root-n consistency and normality.
7
• Following the same argument for the feasible GLS estimator, we find that
−1
√

a 1 0 −1 −1/2 0 −1
n(β̂f − β0 ) = plim X Ω (γ̂)X plim n X Ω (γ̂)u .
n
• Clearly, β̂gls will be asymptotically equivalent to β̂f if

1 0 −1 1
plim X Ω (γ̂)X = plim X0 Ω −1 X
n n
and
plim n− /2 X0 Ω −1 (γ̂)u = plim n− /2 X0 Ω −1 u.
1 1
• For these equalities to hold, it is necessary that plim γ̂ = γ.
Small Sample Properties of the Feasible GLS

• Whether or not feasible GLS is a desirable estimation method in practice depends on how
good an estimate of Ω can be obtained.
• If Ω(γ̂) is a very good estimate, then feasible GLS will have essentially the same properties
as GLS itself.
• As a result, inferences should be reasonably reliable, even though they will not be exact
in finite samples.
• On the other hand, if Ω(γ̂) is a poor estimate, feasible GLS estimates may have quite
different properties from real (infeasible) GLS estimates, and inferences may be quite
misleading.
Alternative Estimation Approaches

• It is possible to iterate a feasible GLS procedure, using β̂f to compute new set of residuals,
ˆ.
û
ˆ to obtain a second-round estimate of γ̂

• Then, use û ˆ , which can be used to calculate
ˆ
second-round feasible GLS estimates, β̂f , and so on.
• This procedure can either be stopped after a predetermined number of rounds or continued
until convergence is achieved (although convergence is not guaranteed).
• Iteration does not change the asymptotic distribution of the feasible GLS estimator, but
it does change its finite-sample distribution.
• Another way to estimate models in which the covariance matrix of the error terms depends
on one or more unknown parameters is to use the method of maximum likelihood.
• As we will see later on, in this case, β and γ are estimated jointly and consistency will
follow if the maximum likelihood regularity conditions are satisfied.
• In many cases, an iterated feasible GLS estimator will be the same as a maximum likeli-
hood estimator based on the assumption of normally distributed errors.
8
6 Testing for Heteroskedasticity

Model Specification and Heteroskedasticity
• It is important to note that in our usual setup, homoskedasticity is imposed as a assump-

tion in model specification.
• If the true DGP is heteroskedastic, it will not the included in the estimated model, and
therefore there is a specification error.
• The specification error does not bias the OLS estimator, but renders it inefficient, as the
sandwich form of its covariance matrix suggests.
• As we have seen, we can compute asymptotically valid covariance matrix estimates for
the (inefficient) OLS and NLS parameter estimates.
• So, what if we choose to assume heteroskedasticity and settle with a inefficient estimator,
but the true DGP is homoskedastic?
• Simulation experiments suggest that this specification error frequently has little cost.
• This evidence can be taken as an indication that it may be prudent to employ an HCCME
anyway, especially if the sample size is large.
• However, in finite samples, tests and confidence intervals based on HCCMEs will always
be somewhat less reliable than ones based on the usual OLS covariance matrix under
homoskedasticity.
• If we have information on the form of the skedastic function, we might well wish to use
feasible generalized least squares, which is asymptotically equal to the efficient generalized
least squares.
• However, small sample properties of the feasible generalized least squares depend critically
on the estimates Ω̂.
• So, if the true DGP is homoskedastic and we assume heteroskedastcity, we can expect
that the specification error may be costly in small samples.
• So, before deciding to use a HCCME or a Feasible GLS procedure, it is advisable to

perform a specification test of the null hypothesis that the error terms are homoskedastic.
Skedastic Function and Heteroskedasticity Testing
• Let us consider a reasonably general model of conditional heteroskedasticity, such as
E(u2t | Ωt ) = h(δ + zt γ),
where the skedastic function h( · ) is a nonlinear function that can take on only posi-
tive values, zt is a 1 × r vector of observations on exogenous or predetermined variables
that belong to the information set Ωt , δ is a scalar parameter, and γ is an r-vector of
parameters.
9
• Under the null hypothesis that γ = 0, the function h(δ+Zt γ) collapses to h(δ), a constant.
• If we think of the skedastic function as a regression equation in conditional expectation

form, then its error form can be written as
u2t = h(δ + zt γ) + vt .
• Alternatively, you can define vt as the difference between u2t and its conditional expecta-
tion, and rewrite the skedastic function as in the last expression.
• Suppose that we actually observe ut . Then, we can test γ = 0 using a Gauss-Newton

regression
u2t − h(δ + Zt γ) = h0 (δ + Zt γ)bδ + h0 (δ + Zt γ)Zt bγ + residual,
where h0 ( · ) is first derivative of h( · ), bδ is the coefficient of δ, and bγ is the r-vector of

coefficients associated with γ.
GNR Testing for Heteroskedasticity
• Remember that we need to evaluate the GNR at “initial” parameter values.
• So, let us evaluate it at γ = 0 and δ = δ̃ ≡ h−1 (σ̃ 2 ), where σ̃ 2 is the sample variance of
ut :
u2t − σ̃ 2 = h0 (δ̃)bδ + h0 (δ̃)Zt bγ + residual.
• For the purpose of testing the null hypothesis that γ = 0, this regression is equivalent to
u2t = bδ + Zt bγ + residual,
with a suitable redefinition of the artificial parameters bδ and bγ , which does not depend
on the functional form of h( · ).
Residuals and Heteroskedasticity Testing
• It can be shown that replacing u2t by û2t does not change the asymptotic distribution of
the F and nR2 statistics for testing the hypothesis bγ = 0;
• The last issue is to choose the variables to be included in Z. White suggests including
all squares and cross-products of the variables em X (why?), which results in the White
Test.
• The general form of the test is basically the Breush-Pagan Test. We will derive the
limiting distribution for this test later, in a more convenient framework.
• Since the asymptotic approximations for these test statistics may be inaccurate in finite-
samples, bootstrapping them when the sample size is small or moderate may be a good
idea.
10

GLS Handout

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

GLS Handout

Cargado por

Copyright:

Formatos disponibles

Generalized Least Squares

Econ 507 Econometric Analysis

y = Xβ + u, u|X ∼ iid(0, σ 2 I),

a result stated in the Gauss-Markov theorem.

Regression Model with Non-spherical Disturbances

• Let us consider the model

2 Generalized Least Squares

• Consider transforming the regression by pre-multiplying by Ψ . Then, the transformed

E(Ψ 0 uu0 Ψ ) = Ψ 0 E(uu0 ) Ψ

Transforming Back to Classic Regression

• In this case, the variance of the transformed error reduces to

E(Ψ 0 uu0 Ψ ) = Ψ 0 (Ψ Ψ 0 )−1 Ψ

• Premultiplying the regression by Ψ 0 gives

GLS of the Transformed Model

• The OLS estimator of β from the transformed regression is

β̂gls = (X0 Ψ Ψ 0 X)−1 X0 Ψ Ψ 0 y

• This is the expression for the generalized least squares, estimator of β.

• Replacing X by Ψ 0 X and σ02 by 1 we get

Var(β̂gls ) = (X0 Ψ Ψ 0 X)−1 = (X0 Ω −1 X)−1 .

The GLS Criterion Function

3 Efficiency of the GLS Estimator

• Since the GLS estimator is a method of moments estimator, it is interesting to compare

• We will obtain efficiency from this comparison.

Method of Moments Representation of GLS

• Assume further that E(u|X, W) = 0. As we have seen before, to obtain consistency,

• Substituting Xβ0 + u for y in W0 (y − Xβ) = 0, we see that

β̂w = β0 + (W0 X)−1 W0 u.

• Therefore, the covariance matrix of β̂w is

= (W0 X)−1 W0 ΩW(X0 W)−1 .

Efficiency of the GLS Estimator

X0 Ω −1 X − X0 W(W0 ΩW)−1 W0 X, (1)

is positive semidefinite (Do it as an exercise).

4 Computing GLS Estimates

• The reason is that when n is large, computation based on Ω, which is an n × n matrix,

GLS with Ω Known Up to a Constant

• Then if we define Ψ in terms of ∆ instead of Ω, the transformed regression is still valid,

(X0 ∆−1 X)−1 X0 ∆−1 y = (X0 (σ −2 Ω)−1 X)−1 X0 (σ −2 Ω)−1 y

• The covariance matrix of β̂gls in this case can be written as

Weighted Least Squares

• In this case, the transformed regression can be written as

Generalized Nonlinear Least Squares

• Differentiating with respect to β and dividing by −2 yields the moment conditions

where, X(β) is the matrix of derivatives of x(β) with respect to β.

5 Feasible Generalized Least Squares

• As we discussed before, even if the matrix Ψ is known, computation of GLS estimates is

Estimating the Variance Matrix Ω

• In many cases it is reasonable to suppose that Ω , or ∆, depends in a known way on a

• In this case, if it is possible to obtain a consistent estimate of γ, then Ω̂ = Ω(γ̂) is

• Then we can define Ψ (γ̂) such that

and obtain GLS estimates conditional on Ψ (γ̂).

Estimating Ω Using Skedastic Functions

E(u2t |xt , zt ) = h(zt ; γ),

where γ is an l-vector of unknown parameters, and zt is a vector of observations on

• An example of a skedastic function is exp(Zt γ), which conveniently produces positive

Example of Feasible GLS Procedure

• Consider the linear regression model

yt = xt β + ut , E(u2t ) = exp(zt γ).

• In order to obtain consistent estimates of γ, we can start obtaining consistent estimates

• These estimates are then used to compute

• This is an example of feasible weighted least squares.

Why Feasible GLS Works

β̂gls = β0 + (X0 Ω −1 X)−1 X0 Ω −1 u.

• Taking probability limits, after rearranging multiplying each factor by an appropriate

• Clearly, β̂gls will be asymptotically equivalent to β̂f if