Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Machines
Nello Cristianini
BIOwulf Technologies
nello@support-vector.net
http://www.support-vector.net/tutorial.html
ICML 2001
A Little History
www.support-vector.net
A Little History
z Annual workshop at NIPS
z Centralized website: www.kernel-machines.org
z Textbook (2000): see www.support-vector.net
z Now: a large and diverse community: from machine
learning, optimization, statistics, neural networks,
functional analysis, etc. etc
z Successful applications in many fields (bioinformatics,
text, handwriting recognition, etc)
z Fast expanding field, EVERYBODY WELCOME ! -
www.support-vector.net
Preliminaries
www.support-vector.net
Very Informal Reasoning
www.support-vector.net
More formal reasoning
www.support-vector.net
Just in case …
x, z = ∑xz
i i
z Hyperplane: i
x
w, x + b = 0
x
x
o x
x
w
o
x
o
b x
o
o
o
www.support-vector.net
Overview of the Tutorial
www.support-vector.net
Parts I and II: overview
www.support-vector.net
Modularity IMPORTANT
CONCEPT
www.support-vector.net
1-Linear Learning Machines
www.support-vector.net
Basic Notation
z Input space x ∈X
z Output space y ∈Y = {−1,+1}
z Hypothesis h ∈H
z Real-valued: f:X → R
z Training Set S = {( x1, y1),...,( xi, yi ),....}
z Test error ε
z Dot product x, z
www.support-vector.net
Perceptron
x
x f ( x ) = w, x + b
h( x ) = sign( f ( x ))
o x
x
w
o
x
o
b x
o
o
o
www.support-vector.net
Perceptron Algorithm
Update rule
(ignoring threshold):
if yi( wk, xi ) ≤ 0 then
x
z
x
x
o x
wk + 1 ← wk + ηyixi o
w
x
x
k ← k +1 b o
o
o
o
x
www.support-vector.net
Observations
www.support-vector.net
Observations - 2
z Mistake bound: x
x
R
x
2 g x
o
M≤ x
γ o
o
x
x
o
o
o
www.support-vector.net
Dual Representation
z if yi 3∑α y x , x + b8 ≤ 0
j j j i then αi ← αi + η
www.support-vector.net
Duality: First Property of SVMs
www.support-vector.net
Limitations of LLMs
www.support-vector.net
Non-Linear Classifiers
www.support-vector.net
Learning in the Feature Space
x f (x)
o x f (o) f (x)
o x f (o) f (x)
o
f (o)
f (x)
o x f (o)
X F
www.support-vector.net
Problems with Feature Space
www.support-vector.net
Implicit Mapping to Feature Space
www.support-vector.net
Kernel-Induced Feature Spaces
www.support-vector.net
IMPORTANT
Kernels CONCEPT
www.support-vector.net
Kernels
www.support-vector.net
IMPORTANT
The Kernel Matrix CONCEPT
K=
… … … … …
www.support-vector.net
The Kernel Matrix
www.support-vector.net
Mercer’s Theorem
www.support-vector.net
More Formally: Mercer’s Theorem
K ( x1, x 2 ) = φ ( x1), φ ( x 2 )
Pos. Def.
I K ( x , z ) f ( x ) f (z )dxdz ≥ 0
∀f ∈ L2
www.support-vector.net
Mercer’s Theorem
K ( x1, x 2 ) = ∑ λφ
i i ( x 1)φi ( x 2 )
i
www.support-vector.net
Examples of Kernels
K ( x, z ) = x, z
d
− x − z 2 / 2σ
K ( x, z ) = e
www.support-vector.net
Example: Polynomial Kernels
x = ( x1, x 2 );
z = ( z1, z 2 );
= ( x 1 z1 + x 2 z 2 ) 2 =
2
x, z
= x12 z12 + x22 z22 + 2 x1z1 x 2 z 2 =
= ( x12 , x22 , 2 x1 x 2 ),( z12 , z22 , 2 z1z 2 ) =
= φ ( x ), φ ( z ) www.support-vector.net
Example: Polynomial Kernels
www.support-vector.net
Example: the two spirals
www.support-vector.net
IMPORTANT
Making Kernels CONCEPT
www.support-vector.net
Second Property of SVMs:
www.support-vector.net
Kernels over General Structures
www.support-vector.net
A bad kernel …
www.support-vector.net
Other Kernel-based algorithms
www.support-vector.net
%5($.
www.support-vector.net
NEW
TOPIC
www.support-vector.net
The Generalization Problem
www.support-vector.net
Statistical (Computational)
Learning Theory
www.support-vector.net
Assumptions and Definitions
www.support-vector.net
~ VC
ε = O
VC Bounds m
VC = (number of dimensions of X) +1
www.support-vector.net
Margin Based Bounds
~ ( R / γ ) 2
ε = O
m
y if ( x i )
γ = min i
f
www.support-vector.net
IMPORTANT
Margin Based Bounds CONCEPT
~ ( R / γ ) 2
ε = O
m
z Best hyperplane: the maximal margin one
z Margin is large is kernel chosen well
www.support-vector.net
Maximal Margin Classifier
www.support-vector.net
Two kinds of margin
o
x
x f
x
o
o
o
www.support-vector.net
Two kinds of margin
www.support-vector.net
Max Margin = Minimal Norm
www.support-vector.net
Max Margin = Minimal Norm
+
Distance between w, x + b = +1
The two convex hulls
−
x
w, x + b = −1
x
x
w,( x + − x − ) = 2
g x
o
x
x
o
x
o
w + − 2
o
o
o
,( x − x ) =
w w
www.support-vector.net
IMPORTANT
The primal problem STEP
z Minimize:
subject to:
w, w
yi 4 w, xi + b 9 ≥ 1
www.support-vector.net
Optimization Theory
www.support-vector.net
From Primal to Dual
1
L( w ) = w, w − ∑ αiyi ! w, xi + b − 1"#$
1 6
2
αi ≥ 0
Differentiate and substitute:
∂L
=0
∂b
∂L
=0
∂w
www.support-vector.net
IMPORTANT
The Dual Problem STEP
1
W (α ) = ∑ αi − ∑iαα
z Maximize:
i jyiyj xi, xj
,j
i 2
z Subject to: αi ≥ 0
∑ αiyi = 0
i
www.support-vector.net
IMPORTANT
Convexity CONCEPT
www.support-vector.net
Kuhn-Tucker Theorem
Properties of the solution:
z Duality: can use kernels
z KKT conditions: 1 6
αi yi w, xi + b − 1 = 0
z
∀i
Sparseness: only the points nearest to the hyperplane
w = ∑α y x
(margin = 1) have positive weight
i i i
z They are called support vectors
www.support-vector.net
KKT Conditions Imply Sparseness
Sparseness:
another fundamental property of SVMs
x
x
x
g x
o
x
x
o
x
o
o
o
www.support-vector.net
Properties of SVMs - Summary
9 Duality
9 Kernels
9 Margin
9 Convexity
9 Sparseness
www.support-vector.net
Dealing with noise
yi 4 w, xi + b 9 ≥ 1 − ξi
www.support-vector.net
The Soft-Margin Classifier
1
Minimize: w, w + C ∑ ξi
2 i
1
w, w + C ∑ ξi
2
Or:
2 i
Subject to:
yi 4 w, xi + b 9 ≥ 1 − ξi
www.support-vector.net
Slack Variables
ε≤
(
1 R + ∑ξ
2
)
2
m γ 2
yi 4 w, xi + b 9 ≥ 1 − ξi
www.support-vector.net
Soft Margin-Dual Lagrangian
1
z Box constraints W(α ) = ∑αi − 2 ∑iαα
,j
i jyiyj xi, xj
i
0 ≤ αi ≤ C
∑ αiyi = 0
i
1 1
z Diagonal ∑i αi − 2 ∑iαα
,j
i jyiyj xi, xj −
2C
∑ αjαj
0 ≤ αi
∑ αiyi ≥ 0
i
www.support-vector.net
The regression case
0 yi-<w,xi>+b
www.support-vector.net
Regression: the ε-tube
www.support-vector.net
Implementation Techniques
www.support-vector.net
Simple Approximation
αi ← αi +
1
1 − yi ∑ αiyiK ( xi, xj )
K ( xi, xi )
www.support-vector.net
Full Solution: S.M.O.
www.support-vector.net
Other “kernelized” Algorithms
www.support-vector.net
On Combining Kernels
www.support-vector.net
IMPORTANT
Kernel Alignment CONCEPT
K1, K 2
A( K1, K 2) =
K1, K1 K 2, K 2
www.support-vector.net
Many interpretations
www.support-vector.net
The ideal kernel
1 1 -1 … -1
1 1 -1 … -1
YY’= -1 -1 1 1
… … … … …
-1 -1 1 … 1
www.support-vector.net
Combining Kernels
www.support-vector.net
Spectral Machines
www.support-vector.net
Courant-Hilbert theorem
vAv
λ = max
v vv'
www.support-vector.net
Optimizing Kernel Alignment
www.support-vector.net
Applications of SVMs
z Bioinformatics
z Machine Vision
z Text Categorization
z Handwritten Character Recognition
z Time series analysis
www.support-vector.net
Text Kernels
www.support-vector.net
Bioinformatics
z Gene Expression
z Protein sequences
z Phylogenetic Information
z Promoters
z …
www.support-vector.net
Conclusions:
z %RR RQ 690VZZZVXSSRUWYHFWRUQHW
www.support-vector.net