Documentos de Académico
Documentos de Profesional
Documentos de Cultura
g y localization
Cordelia Schmid
Recognition
• Classification
– Object present/absent in an image
– Often presence of a significant amount of background clutter
• Localization / Detection
– Localize object within the
frame
– Bounding box or pixel-
level segmentation
Pixel-level
Pixel level object classification
Difficulties
• Intra-class
Intra class variations
1. Sliding
S window detectors
2. Features and adding
g spatial
p information
3. Histogram of Oriented Gradients (HOG)
4. State of the art algorithms and PASCAL VOC
Sliding window detector
• Basic component: binary classifier
Car/non-car
Classifier
Yes,
No,
at a
not car
car
Sliding window detector
• Detect objects in clutter by search
Car/non-car
Classifier
Car/non-car
Classifier
Car/non-car
Classifier
Does the image contain a car? Does the image contain a car?
Feature
Classifier
Extraction
• aspect ratio
• granularity (finite grid)
• partial occlusion
• multiple responses
Outline
1. Sliding
S window detectors
2. Features and adding
g spatial
p information
3. Histogram of Oriented Gradients (HOG)
4. State of the art algorithms and PASCAL VOC
BOW + Spatial pyramids
Start from BoW for region of interest (ROI)
• no spatial information recorded
• sliding window detector
B off Words
Bag W d
Feature Vector
Adding Spatial Information to Bag of Words
Bag of Words
C
Concatenate
t t
Feature Vector
1 BoW
4 BoW
16 BoW
Dense Visual Words
• Why extract only sparse image
fragments?
Quantize
Word
Patch / SIFT
1. Sliding
S window detectors
2. Features and adding
g spatial
p information
3. Histogram of Oriented Gradients + linear SVM classifier
4. State of the art algorithms and PASCAL VOC
Feature: Histogram of Oriented
Gradients (HOG)
dominant
image direction HOG
ency
• tile 64 x 128 pixel window into 8 x 8 pixel cells
freque
• each cell represented by histogram over 8
orientation bins (i.e. angles in range 0-180 degrees) orientation
Histogram of Oriented Gradients (HOG) continued
Feature
Classifier
Extraction
f(x) w x b
T
average over
positive training
p g data
Training
g a sliding
g window detector
Linear SVM
Jittered positives HOG Feature
Classifier
random negatives Extraction
f(x) wT x b
x
• these are the hard negatives for the next round of training
0.8
classifier score decreasing
0.6
on
precisio
0.4
02
0.2
0
0 0.2 0.4 0.6 0.8 1
recall
Effects of retraining
g
Side by side
before retraining after retraining
Side by side
before retraining after retraining
Accelerating Sliding Window Search
• Sliding window search is slow because so many windows are
needed ee.g.
g x × y × scale ≈ 100,000
100 000 for a 320×240 image
Classifier Classifiera
Possibly Classifiera
Possibly Face
1 2
face N
face
Window
• Spatial
S ti l correspondence
d can b be
“engineered in” by spatial tiling
Outline
1. Sliding
S window detectors
2. Features and adding
g spatial
p information
3. HOG + linear SVM classifier
4. State of the art algorithms and PASCAL VOC
PASCAL VOC dataset - Content
• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat,
chair, cow, dining table, dog, horse, motorbike, person,
potted plant, sheep, train, TV
Occluded
O l d d Difficult
Diffi lt
Object is significantly Not scored in
occluded within BB evaluation
Truncated Pose
Object extends Facing left
beyond BB
Examples
• Classification
– Is
I there
th a dog
d ini thi
this iimage?
?
– Evaluation by precision/recall
• Detection
– Localize all the people (if any) in
this image
– Evaluation by precision/recall
/
based on bounding box overlap
Detection: Evaluation of Bounding Boxes
Bgt Bp
Predicted Bp
1
– A good score requires both high
0.8 recall and high precision
Interpolated
0.6 – Application-independent
precision
0
0 0.2 0.4 0.6 0.8 1
recall
Object Detection with Discriminatively
Trained Part Based Models
x1
x6 x2
x5 x3
x4
Filter F
Score of F at position p is
F ⋅ φ(p,
φ(p H)
φ(p, H) = concatenation of
HOG features from
HOG pyramid H subwindow specified by p
z = (p0,..., pn)
p0 : location of root
p1,..., pn : location of parts
Score is
S i sum off filt
filter
scores minus
deformation costs
Score of a Hypothesis
Appearance term Spatial prior
displacements
Training
Latent SVM (MI-SVM)
Training data
We would like to find β such that:
SVM objective
Latent SVM Training
• Optimization:
– Initialize β and iterate:
Alternation
• Pick best z for each positive example strategy
• Optimize
p β with z fixed
high
g scoringg false p
positives
hi h scoring
high i ttrue positives
iti (not enough overlap)
Segmentation Driven Object Detection
with Fisher Vectors
student presentation
Approach
• Pre-select class-
independent candidate
image windows using
i
image segmentation
t ti
[van de Sande et al., Segmentation
as selective search for object
recognition, ICCV'11]
CC 11