Gauss Theorm and The Method of Least Squares

Gauss and the Method of Least Squares
Teddy Petrou Hongxiao Zhu
1
Outline
Who was Gauss?

Why was there controversy in finding the method of least
squares?
Gauss treatment of error
Gauss derivation of the method of least squares
Gauss derivation by modern matrix notation
Gauss-Markov theorem
Limitations of the method of least squares
References
2
Johann Carl Friedrich Gauss
Born:1777 Brunswick, Germany
Died: February 23, 1855, Gttingen, Germany
By the age of eight during arithmetic class he

astonished his teachers by being able to
instantly find the sum of the first hundred
integers.
3
Facts about Gauss
Attended Brunswick College in 1792, where he
discovered many important theorems before even
reaching them in his studies
Found a square root in two different ways to fifty
decimal places by ingenious expansions and
interpolations
Constructed a regular 17 sided polygon, the first
advance in this matter in two millennia. He was only
18 when he made the discovery
4
Ideas of Gauss
Gauss was a mathematical scientist with interests in so many

areas as a young man including theory of numbers, to algebra,
analysis, geometry, probability, and the theory of errors.
His interests grew, including observational astronomy, celestial

mechanics, surveying, geodesy, capillarity, geomagnetism,
electromagnetism, mechanism optics, and actuarial science.
5
Intellectual Personality and Controversy
Those who knew Gauss best found him to be cold and
uncommunicative.
He only published half of his ideas and found no one to share

his most valued thoughts.
In 1805 Adrien-Marie Legendre published a paper on the

method of least squares. His treatment, however, lacked a
formal consideration of probability and its relationship to least
squares, making it impossible to determine the accuracy of the
method when applied to real observations.
Gauss claimed that he had written colleagues concerning the

use of least squares dating back to 1795
6
Formal Arrival of Least Squares
Gauss
Published The theory of the Motion of Heavenly Bodies in 1809.
He gave a probabilistic justification of the method,which was
based on the assumption of a normal distribution of errors.
Gauss himself later abandoned the use of normal error function.
Published Theory of the Combination of Observations Least
Subject to Errors in 1820s. He substituted the root mean square
error for Laplaces mean absolute error.
Laplace Derived the method of least squares (between1802 and

1820) from the principle that the best estimate should have the
smallest mean error -the mean of the absolute value of the error.
7
Treatment of Errors
Using probability theory to describe error

Error will be treated as a random variable
Two types of errors
Constant-associated with calibration
Random error
8
Error Assumptions
Gauss began his study by making two assumptions
Random errors of measurements of the same type lie within

fixed limits
All errors within these limits are possible, but not necessarily
with equal likelihood
9
Density Function
We define the function ( x) with the same meaning as a density function w ith
the following properties .
The probabilit y of errors lying within th e interval (x, x dx) is ( x)dx
Small errors are more likely to occur than large ones
Positive and negative errors of the same maginitude are equally likely, ( x) ( x)
10
Mean and Variance
Define k x ( x)dx . In many cases

assume k=0
Define mean square error as

m x ( x)dx
2 2
2
If k=0 then the variance will equal m
11
Reasons for m
2
m 2 is always positive and is simple.
The function is differentiable and integrable unlike

the absolute value function.
The function approximates the average value in cases

where large numbers of observations are being
considered,and is simple to use when considering
small numbers of observations.
12
More on Variance
k 0 then variance equals
2 2
If m k .
Suppose we have independent random variables {e, e' , e' ' ,...}
with standard deviation 1 and expected value 0. The
linear function of total errors is given by E e ' e'...
k k
Now the variance of E is given as M e i2
2 2 2
i i
i 1 i 1
This is assuming every error falls within standard

deviations from the mean
13
Gauss Derivation of the Method of Least Squares
Suppose a quantity, V=f(x), where V, x are unknown. We

estimate V by an observation L.
If x is calculated by L, L~f(x), error will occur.
But if several quantities V,V,Vdepend on the same

unknown x and they are determined by inexact observations,
then we can recover x by some combinations of the
observations.
Similar situations occur when we observe several quantities that

depend on several unknowns.
14
Problem :
We want to estimate V , V ' , V ' ' , by taking independen t observatio ns : L, L' , L' ' ,.
where V , V ' , V ' ' ,. are functions of unkowns x, y, z ,.
V f1 ( x, y, z ,)

V ' f 2 ( x, y, z , )

V " f 3 ( x, y, z ,)

Let the errors in the observatio ns be :
(V L) (V ' L' )
v : , v' ,
p p'
where the p ' s are the weights of the ' mean errors of the observatio ns'.
( Note : We scaled the errors so they have the same variance ) 15
Consider t he following linear system :

v ax by cz l

v ' a ' x b' y c ' z l '
v' ' a' ' x b' ' y c' ' z l ' '

v, v' , v' ' , are written as linear functions of unkowns x, y, z,...
where the coefficien ts a, b, c are known.
Note :1.This system is ' overdeterm ined' , since
2. This system describes a mapping :
F : R R , or : parameter space( x, y, z...) observatio n space(v, v' , v"...)
16
Solve an optimizati on problem :

min 2 '2 ' '2
where , ' , " , are coefficients of v, v ' , v" ,
s.t : v ' v ' ' ' v ' ' etc. x k
for some constant k independen t of x, y, z, .
We can state the problem as :

We are looking for a linear mapping G( v, v ' , v" , ) from R to R such that :

1. G F is the identiy on R
2. G statisfies an optimality condition, described as below :
Suppose x g (v, v' , v"...) is the first component of G. Then
x g (v, v' , v"...) v ' v ' " v"... k .
We want 2 to be as small as possible, and we want similar condition for

the other componet. 17
Solutions:
2 '2 ' '2
2 '2 ' '2 etc. ( ) 2 ( ' ' ) 2 ( ' ' ' ' ) 2 etc.
where all the ' s denote the coefficien ts we derived by eliminatio n
of the system. From which it is obvious that the sum 2 '2 ' '2
attains its minimun wh en , ' ' , " " , etc.
Its still not obvious:
How do these results relate with the least squares estimation?
18
It can be proved that
(V ( x, y, z,) L) 2 (V ' ( x, y, z ,) L' ) 2

Let v v' v"
2 2 2

p p'
Least squares picks the parameter values that minimize , where all the partials

, , , vanish.
x y z

i.e. 0, 0,
x y
we will get the same results as the minimizati on of 2 '2 ...
19
Gauss derivation by modern matrix notation:
Assume that observable quantities V1 , V2 , , V are linear
functions of parameters x1 , x2 ,, x such that
Vi bi1 x1 ... bi x ci , bij , ci R
we know the values of all the bij and ci .
We measure the Vi in an attempt to infer the values of the xi .
Assume Li is an observatio n of Vi
Switch to a new coordinate system by setting :
vi (Vi Li ) / pi
The system becomes :
v Ax l
20
Gauss derivation by modern matrix notation:
Gauss results are equivalent to the following lemma:

Suppose A is a ( ) matrix of rank . Then ther e is a matrix K
such that the following holds :
x R , KAx x
and among all such matrices the matrix E ( AT A) 1 AT has rows of minimun norm.
Proof : E ( AT A) 1 AT satisfies the first condition.

The optimizati on condition is that :
K ii ... K i should be as small as possible.
2 2 2
Ki
This is equivalent to demanding that the sum of the diagonal entries of KK T
should be as small as possible.
21
Proof continued :
AT A is invertible , denote D ( AT A) 1 . Thus x, x ( AT A) 1 AT Ax DAT Ax

E ( AT A) 1 AT , Thus E : DAT , and EAx x ; Also, we have KAx x;
Subtractin g, we get : x, ( K E ) Ax 0. Thus ( K E ) A is the zero matrix.
Right multiplyin g DT and noting that ADT E T , we get ( K E ) E T 0
Finally, KK T ( E ( K E ))( E ( K E ))T EE T ( K E )( K E )T
This shows that the solution E is in fact the optimal one, since if ( K E ) has
any non - zero entries, ( K E )( K E )T will have strictly positive entries on its
diagonal.
Returning to our original equation : v Ax l , our lemma shows that
G (v) : E ( Ax l ) El is the left inverse of the function F ( x) Ax l (G F ( x) x)
and among all linear left inverses, the non - consistent part of G is optimal.
22
Gauss-Markov theorem
In a linear model
x A
where A is an n p matrix wit h rank p, is an unknown ve ctor, and is the
error vect or. If E( ) 0 and Var( ) 2 I , then for any unbiased estimator
~ ~
of CT , we have E(LS ) and Var(C TLS ) Var ( )
In other word s, when ' s have the same variance and are uncorrelat ed, the least -
squares estimator is the best unbiased estimator with the smallest v ariance.
23
Limitation of the Method of Least Squares
Nothing is perfect:
This method is very sensitive to the presence of

unusual data points. One or two outliers can
sometimes seriously skew the results of a least
squares analysis.
24
References
Gauss, Carl Friedrich, Translated by G. W. Stewart. 1995. Theory of the
Combination of Observations Least Subject to Errors: Part One, Part
Two, Supplement. Philadelphia: Society for Industrial and Applied
Mathematics.
Plackett, R. L. 1949. A Historical Note on the Method of Least Squares.
Biometrika. 36:458460.
Stephen M. Stiger, Gauss and the Invention of Least Squares. The
Annals of Statistics, Vol.9, No.3(May,1981),465-474.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Belinda B.Brand, Guass Method of Least Squares: A historically-based
introduction. August 2003
http://www.infoplease.com/ce6/people/A0820346.html
http://www.stetson.edu/~efriedma/periodictable/html/Ga.html
25

Gauss Theorm and The Method of Least Squares

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Gauss Theorm and The Method of Least Squares

Cargado por

Copyright:

Formatos disponibles

Gauss and the Method of Least Squares

Teddy Petrou Hongxiao Zhu

Who was Gauss?

Born:1777 Brunswick, Germany

Died: February 23, 1855, Gttingen, Germany

By the age of eight during arithmetic class he

Gauss was a mathematical scientist with interests in so many

His interests grew, including observational astronomy, celestial

He only published half of his ideas and found no one to share

In 1805 Adrien-Marie Legendre published a paper on the

Gauss claimed that he had written colleagues concerning the

Laplace Derived the method of least squares (between1802 and

Using probability theory to describe error

Gauss began his study by making two assumptions

Random errors of measurements of the same type lie within

The probabilit y of errors lying within th e interval (x, x dx) is ( x)dx

Small errors are more likely to occur than large ones

Define k x ( x)dx . In many cases

m 2 is always positive and is simple.

The function is differentiable and integrable unlike

The function approximates the average value in cases

This is assuming every error falls within standard

Suppose a quantity, V=f(x), where V, x are unknown. We

If x is calculated by L, L~f(x), error will occur.

But if several quantities V,V,Vdepend on the same

Similar situations occur when we observe several quantities that

Consider t he following linear system :

We want 2 to be as small as possible, and we want similar condition for

It can be proved that

(V ( x, y, z,) L) 2 (V ' ( x, y, z ,) L' ) 2

Gauss results are equivalent to the following lemma:

Proof : E ( AT A) 1 AT satisfies the first condition.

AT A is invertible , denote D ( AT A) 1 . Thus x, x ( AT A) 1 AT Ax DAT Ax

This method is very sensitive to the presence of

También podría gustarte