Documentos de Académico
Documentos de Profesional
Documentos de Cultura
,
1 1
D
n n i i
M
i i n
i i M
x u b z u z b
= = +
= + = +
Independent of n
2
1
1
N
n n
n
x J x
N
=
=
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Minimising J:
W. r. t z:
W. r. t b:
Subst it ut ing:
Leads t o:
0
T
nj n j
nj
J
z x u
z
= =
1
1
0
T
N
T
j j n j
n
j
J
b x u x u
b N
=
| |
= = =
|
\ .
{ }
1
) (
D
T
n n n i i
i M
x x x u u x
= +
=
{ }
2
1 1 1
1
N D D
T T T
n i i i i
n i M i M
x u u S x u u J
N
= = + = +
= =
1
1
( )( )
N
T
n n
n
x x x x
N
S
=
=
Data covariance matrix:
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Minimising J:
Finding t he opt imal u
i
require a minimisat ion
wit h const raint s:
I nt roduce Lagrange mult ipliers
The opt imal is found when u
i
is an eigenvect or of S
Eigenvalues are posit ive, so J is minimal if t he u
i
are
t he eigenvect or wit h t he sm al l est eigenvalues
1
1
D
T
i i
i M
u Su
N
J
= +
=
1
i
u =
1
D
i i i i
i M
Su J u
= +
= =
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
PCA is t he ort hogonal proj ect ion of t he dat a
ont o a lower subspace such t hat t he variance
of t he proj ect ed dat a is maximised.
I nformally: more variance means more informat ion
Probabilist ic formulat ion:
Lat ent variable z: proj ect ion on t he subspace
EM algorit hm:
Maximise t he log- likelihood of p( x)
Find t he opt imal W, and : t hey correspond t o t he
dat a mean and t he principal component of t he dat a.
- > Can deal wit h missing dat a ( among ot her
advant ages)
2
( ) ( | 0, ) ( | ) ( , | ) p z N z I p x z N x z I W = = +
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
I CA: I ndependent Component Analysis
Similar t o t he Probabilist ic formulat ion, except t he
lat ent variable have a non- linear, non gaussian
dist ribut ion:
Used in signal processing. Typical example is blind
source separat ion in audio signal analysis.
CCA: Canonical Correlat ion Analysis
Creat es a model t hat maximally correlat es 2 set s
of variable
Used in dat a analysis/ st at ist ic t o find what is
common bet ween t wo set s of observat ions.
1
( ( ) )
M
j
j
p z p z
=
=
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A good way t o build a classifier
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
What is classificat ion ( in layman t erms) ?
NL
L
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Comput at ional learning t heory
dist inguishes bet ween a:
St rong learning algorit hm: finds wit h a high
probabilit y an arbit rarily accurat e classifier
Weak learning algorit hm: Only finds a classifier
wit h a bounded accuracy.
For example: Support Vect or Machines
wit h linear kernel only creat e a bounded
accuracy.
But : They are at least bet t er t han random
guessing!
( i. e. t he classificat ion error is lower t han 0. 5)
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
SVM
Support vector machines for joint multvariables
optimization [Spinello08]
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Decision St umps are a class of very simple
weak classifiers.
Goal: Find an axis- aligned hyperplane
t hat minimizes t he classificat ion error.
This can be done for each feat ure ( i. e.
for each dimension in feat ure space)
I t can be shown t hat t he classificat ion error
is always bet t er t han 0. 5 ( random
guessing) .
I dea: apply many weak classifiers, where
each is t rained on t he misclassified
examples of t he previous.
Object Classification Applications
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Weak classifiers ( in Adaboost ) are binary
classifiers
)
`
> +
=
m
x m
m j x c
j
) , , | (
Stump: simple most non trivial type of decision tree
(equivalent to a linear classifier defined by affine
hyperplane)
) 1 , 1 ( + m
The hyperplane is ort hogonal t o j axis wit h which it int ersect s in
( it ignores all ent ries of x except )
j
x
1
x
2
x
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Boost ing is a t echnique t o build a st rong
learning algorit hm from a given weak
learning algorit hm.
The most popular boost ing algorit hm is
AdaBoost ( adapt ive boost ing) .
I t assigns a weight t o each t raining dat a point .
I n t he beginning, all weight s are equal
I n each round AdaBoost finds a weak classifier and
re- weight s t he misclassified point s.
Correct classified point s are weight ed less,
misclassified point s are weight ed higher
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Algorit hm TrainAdaBoost :
1. for do
2. for do
3. Find a classifier t hat minimizes
4. comput e
5. ret urn
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Algorit hm ClassifyAdaBoost :
1. ret urn
Major features:
Accuracy of the classifier increases with the
number M of weak classifiers. I.e. the algorithm is
arbitrarily accurate
Classification can be done very fast (in contrast to
training)
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Slide from prof. Buhman: Machine Learning
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
The st at e of t he art :
Robust Real- t ime Obj ect Det ect ion,
Paul Viola and Michael Jones, I WSCTV, 2001
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Feat ures for face det ect ion
Quick evaluat ion t hrough t he int egral image
approach
Classifier select ion
How t o select a minimal set of feat ures/ weak
classifier t o det ect a face
Classifier cascade
How t o efficient ly assemble classifiers
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Defined as difference of
rect angular int egral area:
The sum of t he pixels which
lie wit hin t he whit e
rect angles are subt ract ed
from t he sum of pixels in
t he grey rect angles.
One feat ure defined as:
Feat ure t ype: A, B, C or D
Feat ure posit ion and size
( )
( )
( , ) ( , )
White Grey
I x y dxdy I x y dxdy
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Defined as :
I nt egral on rect angle D can
be comput ed in 4 access t o
I
int
:
Very efficient way t o
comput e feat ures
= I(x,y) dy dx ( , )
int
x X y Y
I X Y
(1 ( ) , ) (4) (2) (3)
int int int int
D
I I x y I I I = +
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A weak classifier has 3 at t ribut es:
A feat ure f
j
( t ype, size and posit ion)
A t hreshold
j
A comparison operat or op
j
= < or >
The result ing weak classifier is:
x is a 24x24 pixels window in t he image
( ) ( )
j j j j
x f x op h =
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A classifier with only this two features can be trained to
recognise 100% of the faces, with 40% of false positives
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
scale = 24x24
Do {
For each posit ion in t he image {
Try classifying t he part of t he image st art ing at t his
posit ion, wit h t he current scale, using t he classifier
select ed by AdaBoost
}
Scale = Scale x 1. 5
} unt il maximum scale
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Basic idea:
I t is easy t o det ect t hat somet hing is not a face
Tune( boost ) classifier t o be very reliable at saying
NO ( i. e. very low false negat ive)
St op evaluat ing t he cascade of classifier if one
classifier says NO
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Fast er processing
Quick eliminat ion of useless windows
Each individual classifier is t rained t o deal
only wit h t he example t hat t he previous
ones could not process
Very specialised
The deeper in t he cascade, t he more
complex ( t he more feat ures) in t he
classifiers.
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
A
u
t
o
n
o
m
o
u
s
S
y
s
t
e
m
s
L
a
b
Zr i ch
Face det ect ion is solved
Algorit hms such as Viola-Jones AdaBoost are very
efficient and easily implement ed in hardware
Occurring on digit al camera and camcorder
The approach used in Viola-Jones algorit hm
are generic enough t o be used for ot her
det ect ion t asks
PCA can st ill be useful, but only on very
cont rolled set t ings