Está en la página 1de 25

Gauss and the Method of Least Squares

Teddy Petrou Hongxiao Zhu

1
Outline

Who was Gauss?


Why was there controversy in finding the method of least
squares?
Gauss treatment of error
Gauss derivation of the method of least squares
Gauss derivation by modern matrix notation
Gauss-Markov theorem
Limitations of the method of least squares
References

2
Johann Carl Friedrich Gauss

Born:1777 Brunswick, Germany

Died: February 23, 1855, Gttingen, Germany

By the age of eight during arithmetic class he


astonished his teachers by being able to
instantly find the sum of the first hundred
integers.

3
Facts about Gauss
Attended Brunswick College in 1792, where he
discovered many important theorems before even
reaching them in his studies
Found a square root in two different ways to fifty
decimal places by ingenious expansions and
interpolations
Constructed a regular 17 sided polygon, the first
advance in this matter in two millennia. He was only
18 when he made the discovery

4
Ideas of Gauss

Gauss was a mathematical scientist with interests in so many


areas as a young man including theory of numbers, to algebra,
analysis, geometry, probability, and the theory of errors.

His interests grew, including observational astronomy, celestial


mechanics, surveying, geodesy, capillarity, geomagnetism,
electromagnetism, mechanism optics, and actuarial science.

5
Intellectual Personality and Controversy
Those who knew Gauss best found him to be cold and
uncommunicative.

He only published half of his ideas and found no one to share


his most valued thoughts.

In 1805 Adrien-Marie Legendre published a paper on the


method of least squares. His treatment, however, lacked a
formal consideration of probability and its relationship to least
squares, making it impossible to determine the accuracy of the
method when applied to real observations.

Gauss claimed that he had written colleagues concerning the


use of least squares dating back to 1795

6
Formal Arrival of Least Squares

Gauss
Published The theory of the Motion of Heavenly Bodies in 1809.
He gave a probabilistic justification of the method,which was
based on the assumption of a normal distribution of errors.
Gauss himself later abandoned the use of normal error function.
Published Theory of the Combination of Observations Least
Subject to Errors in 1820s. He substituted the root mean square
error for Laplaces mean absolute error.

Laplace Derived the method of least squares (between1802 and


1820) from the principle that the best estimate should have the
smallest mean error -the mean of the absolute value of the error.

7
Treatment of Errors

Using probability theory to describe error


Error will be treated as a random variable
Two types of errors
Constant-associated with calibration

Random error

8
Error Assumptions

Gauss began his study by making two assumptions

Random errors of measurements of the same type lie within


fixed limits

All errors within these limits are possible, but not necessarily
with equal likelihood

9
Density Function
We define the function ( x) with the same meaning as a density function w ith
the following properties .

The probabilit y of errors lying within th e interval (x, x dx) is ( x)dx

Small errors are more likely to occur than large ones

Positive and negative errors of the same maginitude are equally likely, ( x) ( x)

10
Mean and Variance

Define k x ( x)dx . In many cases


assume k=0
Define mean square error as

m x ( x)dx
2 2

2
If k=0 then the variance will equal m
11
Reasons for m
2

m 2 is always positive and is simple.

The function is differentiable and integrable unlike


the absolute value function.

The function approximates the average value in cases


where large numbers of observations are being
considered,and is simple to use when considering
small numbers of observations.

12
More on Variance
k 0 then variance equals
2 2
If m k .
Suppose we have independent random variables {e, e' , e' ' ,...}
with standard deviation 1 and expected value 0. The
linear function of total errors is given by E e ' e'...
k k
Now the variance of E is given as M e i2
2 2 2
i i
i 1 i 1

This is assuming every error falls within standard


deviations from the mean

13
Gauss Derivation of the Method of Least Squares

Suppose a quantity, V=f(x), where V, x are unknown. We


estimate V by an observation L.

If x is calculated by L, L~f(x), error will occur.

But if several quantities V,V,Vdepend on the same


unknown x and they are determined by inexact observations,
then we can recover x by some combinations of the
observations.

Similar situations occur when we observe several quantities that


depend on several unknowns.

14
Gauss Derivation of the Method of Least Squares

Problem :
We want to estimate V , V ' , V ' ' , by taking independen t observatio ns : L, L' , L' ' ,.
where V , V ' , V ' ' ,. are functions of unkowns x, y, z ,.
V f1 ( x, y, z ,)

V ' f 2 ( x, y, z , )


V " f 3 ( x, y, z ,)

Let the errors in the observatio ns be :
(V L) (V ' L' )
v : , v' ,
p p'
where the p ' s are the weights of the ' mean errors of the observatio ns'.
( Note : We scaled the errors so they have the same variance ) 15
Gauss Derivation of the Method of Least Squares

Consider t he following linear system :



v ax by cz l

v ' a ' x b' y c ' z l '
v' ' a' ' x b' ' y c' ' z l ' '


v, v' , v' ' , are written as linear functions of unkowns x, y, z,...
where the coefficien ts a, b, c are known.
Note :1.This system is ' overdeterm ined' , since
2. This system describes a mapping :
F : R R , or : parameter space( x, y, z...) observatio n space(v, v' , v"...)

16
Solve an optimizati on problem :


min 2 '2 ' '2
where , ' , " , are coefficients of v, v ' , v" ,
s.t : v ' v ' ' ' v ' ' etc. x k
for some constant k independen t of x, y, z, .
We can state the problem as :

We are looking for a linear mapping G( v, v ' , v" , ) from R to R such that :

1. G F is the identiy on R
2. G statisfies an optimality condition, described as below :
Suppose x g (v, v' , v"...) is the first component of G. Then
x g (v, v' , v"...) v ' v ' " v"... k .

We want 2 to be as small as possible, and we want similar condition for


the other componet. 17
Gauss Derivation of the Method of Least Squares

Solutions:
2 '2 ' '2
2 '2 ' '2 etc. ( ) 2 ( ' ' ) 2 ( ' ' ' ' ) 2 etc.
where all the ' s denote the coefficien ts we derived by eliminatio n
of the system. From which it is obvious that the sum 2 '2 ' '2
attains its minimun wh en , ' ' , " " , etc.
Its still not obvious:
How do these results relate with the least squares estimation?

18
Gauss Derivation of the Method of Least Squares

It can be proved that

(V ( x, y, z,) L) 2 (V ' ( x, y, z ,) L' ) 2


Let v v' v"
2 2 2

p p'
Least squares picks the parameter values that minimize , where all the partials

, , , vanish.
x y z

i.e. 0, 0,
x y
we will get the same results as the minimizati on of 2 '2 ...

19
Gauss derivation by modern matrix notation:
Assume that observable quantities V1 , V2 , , V are linear
functions of parameters x1 , x2 ,, x such that
Vi bi1 x1 ... bi x ci , bij , ci R
we know the values of all the bij and ci .
We measure the Vi in an attempt to infer the values of the xi .
Assume Li is an observatio n of Vi
Switch to a new coordinate system by setting :
vi (Vi Li ) / pi
The system becomes :
v Ax l

20
Gauss derivation by modern matrix notation:

Gauss results are equivalent to the following lemma:


Suppose A is a ( ) matrix of rank . Then ther e is a matrix K
such that the following holds :
x R , KAx x
and among all such matrices the matrix E ( AT A) 1 AT has rows of minimun norm.

Proof : E ( AT A) 1 AT satisfies the first condition.


The optimizati on condition is that :
K ii ... K i should be as small as possible.
2 2 2
Ki
This is equivalent to demanding that the sum of the diagonal entries of KK T
should be as small as possible.
21
Proof continued :

AT A is invertible , denote D ( AT A) 1 . Thus x, x ( AT A) 1 AT Ax DAT Ax


E ( AT A) 1 AT , Thus E : DAT , and EAx x ; Also, we have KAx x;
Subtractin g, we get : x, ( K E ) Ax 0. Thus ( K E ) A is the zero matrix.
Right multiplyin g DT and noting that ADT E T , we get ( K E ) E T 0
Finally, KK T ( E ( K E ))( E ( K E ))T EE T ( K E )( K E )T
This shows that the solution E is in fact the optimal one, since if ( K E ) has
any non - zero entries, ( K E )( K E )T will have strictly positive entries on its
diagonal.
Returning to our original equation : v Ax l , our lemma shows that
G (v) : E ( Ax l ) El is the left inverse of the function F ( x) Ax l (G F ( x) x)
and among all linear left inverses, the non - consistent part of G is optimal.
22
Gauss-Markov theorem

In a linear model
x A
where A is an n p matrix wit h rank p, is an unknown ve ctor, and is the
error vect or. If E( ) 0 and Var( ) 2 I , then for any unbiased estimator
~ ~
of CT , we have E(LS ) and Var(C TLS ) Var ( )

In other word s, when ' s have the same variance and are uncorrelat ed, the least -
squares estimator is the best unbiased estimator with the smallest v ariance.

23
Limitation of the Method of Least Squares

Nothing is perfect:

This method is very sensitive to the presence of


unusual data points. One or two outliers can
sometimes seriously skew the results of a least
squares analysis.

24
References
Gauss, Carl Friedrich, Translated by G. W. Stewart. 1995. Theory of the
Combination of Observations Least Subject to Errors: Part One, Part
Two, Supplement. Philadelphia: Society for Industrial and Applied
Mathematics.
Plackett, R. L. 1949. A Historical Note on the Method of Least Squares.
Biometrika. 36:458460.
Stephen M. Stiger, Gauss and the Invention of Least Squares. The
Annals of Statistics, Vol.9, No.3(May,1981),465-474.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Plackett, Robin L. 1972. The Discovery of the Method of Least Squares.
Belinda B.Brand, Guass Method of Least Squares: A historically-based
introduction. August 2003
http://www.infoplease.com/ce6/people/A0820346.html
http://www.stetson.edu/~efriedma/periodictable/html/Ga.html

25

También podría gustarte