Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Matthieu Cord
Joint work with Thibaut Durand, Nicolas Thome
1/35
Outline
2/35
Motivations
3/35
Motivations
How to learn without bounding boxes?
Multiple-Instance Learning/Latent variables for missing
information [Felzenszwalb, PAMI10]
Latent SVM and extensions => MANTRA
How to learn deep without bounding boxes?
Learning invariance with input image transformations
I Spatial Transformer Networks [Jaderberg, NIPS15]
[Yang, CVPR16]
Parts model
I Automatic discovery and optimization of parts for image
4/35
Notations
Prediction function:
6/35
MANTRA: Minimum Maximum Latent Structural SVM
MANTRA model:
Pair of latent variables (h+
i,y , hi,y )
I max scoring latent value: h+ = arg max hw, (x , y, h)i
i,y i
hH
I min scoring latent value: h
i,y = arg min hw, (xi , y, h)i
hH
New scoring function:
Dw (xi , y) = hw, (xi , y, h+
i,y )i + hw, (xi , y, hi,y )i (2)
MANTRA model:
Pair of latent variables (h+
i,y , hi,y )
I max scoring latent value: h+ = arg max hw, (x , y, h)i
i,y i
hH
I min scoring latent value: h
i,y = arg min hw, (xi , y, h)i
hH
New scoring function:
Dw (xi , y) = hw, (xi , y, h+
i,y )i + hw, (xi , y, hi,y )i (2)
street image x Dw (x, street) = 2 Dw (x, highway) = 0.7 Dw (x, coast) = 1.5
8/35
MANTRA: Model Training
Learning formulation
Loss function: `w (xi , yi ) = max [(yi , y) + Dw (xi , y)] Dw (xi , yi )
yY
9/35
MANTRA: Optimization
Solve Inference maxy Dw (xi , y) & LAI maxy [(yi , y) + Dw (xi , y)]
I Exhaustive for binary/multi-class classification
I Exact and efficient solutions for ranking
10/35
WELDON
Weakly supErvised Learning of Deep cOnvolutional Nets
MANTRA extension for training deep CNNs
Learning (x, y, h): end-to-end learning of deep CNNs with
structured prediction and latent variables
I Incorporating multiple positive & negative evidence
I Training deep CNNs with structured loss
11/35
Standard deep CNN architecture: VGG16
Simonyan et al. Very deep convolutional networks for large-scale image recognition.
ICLR 2015
12/35
MANTRA adaptation for deep CNN
Problem
Fixed-size image as input
13/35
MANTRA adaptation for deep CNN
Problem
Fixed-size image as input
13/35
MANTRA adaptation for deep CNN
Problem
Fixed-size image as input
13/35
MANTRA adaptation for deep CNN
Problem
Fixed-size image as input
13/35
WELDON: deep architecture
C : number of classes
14/35
Aggregation function
[Oquab, 2015]
Region aggregation = max
Select the highest-scoring window
17/35
WELDON: learning
Objective function for multi-class task and k = 1:
N
1 X
min R(w) + `(fw (xi ), yigt )
w N
i=1
w w 0
fw (xi ) =arg max max Lconv (xi , y , h) + min 0
Lconv (xi , y , h )
y h h
Class is present
Increase score of selecting windows.
19/35
WELDON: learning
Class is absent
Decrease score of selecting windows.
20/35
Experiments
Datasets
Object recognition: Pascal VOC 2007, Pascal VOC 2012
Scene recognition: MIT67, 15 Scene
Visual recognition, where context plays an important role:
COCO, Pascal VOC 2012 Action
22/35
Experiments
23/35
Object recognition
15 Scene MIT67
VGG16 (online code) [1] 91.2 69.9
MOP CNN [2] 68.9
Negative parts [3] 77.1
WELDON 94.3 78.0
Table: Multi-class accuracy results on scene categorization datasets.
26/35
Visual results
27/35
Visual results
28/35
Visual results
29/35
Visual results (failing examples)
30/35
Visual results (failing examples)
Kindergarden Classroom
31/35
Analysis
Impact of the different improvements
a) max b) +k=3 c) +min d) +AP VOC07 VOC12 action
X 83.6 53.5
X X 86.3 62.6
X X 87.5 68.4
X X X 88.4 71.7
X X X 87.8 69.8
X X X X 88.9 72.6
32/35
Analysis
Impact of the number or regions k
k=1 k=3
33/35
Connections to others Latent Variables Models
Hidden CRF (HCRF) [Quattoni, PAMI07]
N
1 C X X X
kwk2 + log exphw, (xi , y, h)i log exphw, (xi , yi , h)i
2 N
i=1 (y,h)YH hH
WELDON
N
1 CX X X
kwk2 + max (yi ,y)+ hw,(xi ,y,h)i hw,(xi , yi , h)i
2 N y
i=1 hH hH
34/35
Thibaut Durand Nicolas Thome Matthieu Cord