Documentos de Académico
Documentos de Profesional
Documentos de Cultura
0 5
Sistemi Multimediali - DIS 2011
0 5
Sistemi Multimediali - DIS 2011
0 5
3
Sistemi Multimediali - DIS 2011
B
Sistemi Multimediali - DIS 2011
Euclidean distance
(A,B)
B
Sistemi Multimediali - DIS 2011
C (A,C)
A
(A,B)
B
Sistemi Multimediali - DIS 2011
C (A,C)
A
(A,B)
B
Closer to A
Similar to A
Sistemi Multimediali - DIS 2011
A
Sistemi Multimediali - DIS 2011
Nearest-neighbor search
Sistemi Multimediali - DIS 2011
A
Sistemi Multimediali - DIS 2011
range search
Sistemi Multimediali - DIS 2011
A
Lets try angles...
F 3 2 5
A 5 5 5
E 2 2 2
A
E
F
Lets try angles...
Similar F 3 2 5
composition
A 5 5 5
E 2 2 2
A
E
F
Sistemi Multimediali - DIS 2011
Similar F 3 2 5
composition
A 5 5 5
E 2 2 2
A
E
F If we use
angles as a
similarity
measure,
then A is
more similar
cos(AE) > cos(AF) Maria Luisa Sapino
to E -than
Basi di dati
F
Multimediali
Sistemi Multimediali - DIS 2011
Angle-based measures
Given
x x1 , x2 ,..., xn y y1 , y2 ,..., yn
Dot product n
x. y xi y
i
i 1
Cosine similarity
cosx , y
x. y
x y
Sistemi Multimediali - DIS 2011
Application dependent...
L1-metric: d = (dX+dY)
Y
dY
dX
L2-metric: d = (dX2+dY2)1/2
Y
dY
dX
L3-metric; d = (dX3+dY3)1/3
.....
.....
L(infinity): d = max{X,Y}
Sistemi Multimediali - DIS 2011
metric model
Feature
a property of interest that can help us index an
object
For a student record
student_ID
can be a feature
What are the features for an image?
Sistemi Multimediali - DIS 2011
Image features
There are many possible features
Color histogram
Texture
Edges
Shapes
Objects
Object or scene semantics
Feature selection: which one to use for indexing?
Sistemi Multimediali - DIS 2011
Image analysis
..so...
Problem
If each pixel is treated as a different feature.
..then the feature vector size is equal to the
number of pixels for example . 628
1024
Feature Selection
The initial step of a MIS design involves feature
selection or dimensionality reduction
data are transformed and projected in such a way that
the selected features are the important ones!
Important ..
Application semantics
Perception impact
Discrimination Power
Object Description Power
Query Description Power and Workload
Maria Luisa Sapino - Basi di dati
30
Multimediali
Sistemi Multimediali - DIS 2011
Transforms
A
Sistemi Multimediali - DIS 2011
Transforms
A
Sistemi Multimediali - DIS 2011
Transforms
Transforms
Transforms
Projection
Sistemi Multimediali - DIS 2011
(A,B) (A,B)
Sistemi Multimediali - DIS 2011
2
1
1 2
False hit
(1> 1) Miss
(2< 2)
Sistemi Multimediali - DIS 2011
Misses are not desirable!
Can not be eliminated with postprocessing
2
1
1 2
False hit
(1> 1) Miss
(2< 2)
Sistemi Multimediali - DIS 2011
Good feature..
A good feature is significant and enables us to
differentiate objects from others as much as
possible
more uncertain
P(a) = 0.5, P(b) = 0.5 H=1
more uncertain
P(a) = 0.5, P(b) = 0.5 H=1
more information
P(a) = 1.0, P(b) = 0.0 H=0
less uncertain
less information
Sistemi Multimediali - DIS 2011
F1
F3
Sistemi Multimediali - DIS 2011
F2
F1
F3
Sistemi Multimediali - DIS 2011
F2 Better separation!
F1
F3
Sistemi Multimediali - DIS 2011
F2 Better separation!
Less frequent!
F1
F3
Sistemi Multimediali - DIS 2011
52 A. Picariello
Sistemi Multimediali - DIS 2011
A. Picariello
Sistemi Multimediali - DIS 2011
Note that ..
COV(X,Y) = E[(X-E[X]) (Y-E[Y])]
1
X ,Y i (x i x )(y i y )
n
VAR(X)= COV(X,X)
54 A. Picariello
Sistemi Multimediali - DIS 2011
PCA Goals
.. To identify a set of alternative dimensions for the given data
space such that the covariance matrix of the data along this
new set of dimensions is diagonal
through eigen-decomposition into eigenvalues and eigenvectors
r
(S i I) r 0
S PCP 1
1 0 ... 0
0 ... 0
C
2
0 0 ... 0
0 0 ... n
r r r
P r1 r2 ... rn
55 A. Picariello
Sistemi Multimediali - DIS 2011
5
2nd Principal
Component, y2 1st Principal
Component, y1
4
2
4.0 4.5 5.0 5.5 6.0
Sistemi Multimediali - DIS 2011
PCA Eigenvalues
5
1 2
2
4.0 4.5 5.0 5.5 6.0
Sistemi Multimediali - DIS 2011
PCA Algorithm
PCA algorithm:
1. X Create a data matrix, with one row vector xn
per data point
2. X subtract mean x from each row vector xn in X
3. covariance matrix of X
Find eigenvectors and eigenvalues of
PCs the M eigenvectors with largest eigenvalues
Sistemi Multimediali - DIS 2011
2d Data
10
-2
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Sistemi Multimediali - DIS 2011
Principal Components
5 1st principal vector
4
3
Gives best axis to
project 2
-2
-5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Sistemi Multimediali - DIS 2011
A: n x m matrix
U: n x r matrix
L: r x r diagonal matrix (r: rank of the matrix)
V: m x r matrix
Sistemi Multimediali - DIS 2011
SVD - Properties
THEOREM [Press+92]: always possible to decompose matrix
A into A = U L VT , where
U, L, V: unique (*)
U, V: column orthonormal (ie., columns are unit vectors,
orthogonal to each other)
UTU = I; VTV = I (I: identity matrix)
L: singular value are positive, and sorted in decreasing order
Sistemi Multimediali - DIS 2011
Principal component
F2 Better separation!
F1
F3
Sistemi Multimediali - DIS 2011
F2
Principal component
is a combination of
features!
F1
F3
Sistemi Multimediali - DIS 2011
Principal component
is a combination of
features!
F1
F3
Sistemi Multimediali - DIS 2011
Compactness of a database
Compactness of a database
more
compact
Sistemi Multimediali - DIS 2011
Compactness of a database
Feature quality
A feature is
good if we remove it, the overall compactness increases
bad if we remove it, the overall compactness decreases
good bad