Generating Training Data for Deep Neural Networks by exploiting [PDF]

Find interesting patterns in dataset D = {xi}N i=1. â· Examples: image ...... Results on Synthia dataset [Ros et al., C

1 downloads 7 Views 35MB Size

Report

Download PDF

PNG Network

Recommend Stories

Deep neural networks for cryptocurrencies price prediction

No matter how you feel: Get Up, Dress Up, Show Up, and Never Give Up! Anonymous

Deep Recurrent Neural Networks for Supernovae Classification

Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

Merging Deep Neural Networks for Mobile Devices

Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

Lecture 6 Optimization for Deep Neural Networks

When you do things from your soul, you feel a river moving in you, a joy. Rumi

Hyphenation using deep neural networks

Come let us be friends for once. Let us make life easy on us. Let us be loved ones and lovers. The earth

Improving the Robustness of Deep Neural Networks via Stability Training

Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

training deep neural-networks using a noise adaptation layer

Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Effectiveness of Unsupervised Training in Deep Learning Neural Networks

I cannot do all the good that the world needs, but the world needs all the good that I can do. Jana

[PDF] Download Neural Networks

I want to sing like the birds sing, not worrying about who hears or what they think. Rumi

[PDF] Download Neural Networks

Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Idea Transcript

Generating Training Data for Deep Neural Networks by exploiting LIDAR, Cameras and Maps Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, T¨ubingen Computer Vision and Geometry Group, ETH Z¨urich

October 23, 2017

Max Planck Institute for Intelligent Systems Autonomous Vision Group

2

Types of Machine Learning Supervised Learning (“Predictive”) I

Learn mapping from inputs x to outputs y from labeled data D = {xi , yi }N i=1

I

Examples: image classiﬁcation, speech recognition, . . .

3

Types of Machine Learning Supervised Learning (“Predictive”) I

Learn mapping from inputs x to outputs y from labeled data D = {xi , yi }N i=1

I

Examples: image classiﬁcation, speech recognition, . . .

Unsupervised Learning (“Descriptive”) I

Find interesting patterns in dataset D = {xi }N i=1

I

Examples: image segmentation, dimensionality reduction, . . .

3

Types of Machine Learning Supervised Learning (“Predictive”) I

Learn mapping from inputs x to outputs y from labeled data D = {xi , yi }N i=1

I

Examples: image classiﬁcation, speech recognition, . . .

Unsupervised Learning (“Descriptive”) I

Find interesting patterns in dataset D = {xi }N i=1

I

Examples: image segmentation, dimensionality reduction, . . .

Reinforcement Learning I

Find suitable actions to maximize reward, discover via trial & error

I

Examples: robotic systems, AlphaGo, . . . 3

The Deep Learning Revolution

What is the problem?

Data

Data Data Data Data

Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data

Labels

Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels Labels

Money

Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money Money

Image Classiﬁcation

[Deng et al., CVPR 2009] 13

Semantic Segmentation

[Cordts et al., CVPR 2016]

14

Amazon Mechannical Turk

15

Data Annotation Industry

16

What about other tasks?

Stereo

18

Stereo

18

Stereo

18

Optical Flow

J. Gibson, 1950

MPI Sintel, 2012 19

Stereoautograph

www.wild-heerbrugg.com/photogrammetry1.htm 20

The Supervision Dilemma

Unsupervised Learning I

Great promises, but does not work (yet)

21

The Supervision Dilemma

Unsupervised Learning I

Great promises, but does not work (yet)

Supervised Learning I

Data annotation very labor expensive

21

The Supervision Dilemma

Unsupervised Learning I

Great promises, but does not work (yet)

Supervised Learning I

Data annotation very labor expensive

I

Consistency between annotators difﬁcult

21

The Supervision Dilemma

Unsupervised Learning I

Great promises, but does not work (yet)

Supervised Learning I

Data annotation very labor expensive

I

Consistency between annotators difﬁcult

I

Sometimes manual annotation virtually impossible (e.g., ﬂow)

21

What can we do?

Synthetic Data

[Gaidon et al., CVPR 2016] 23

Self-Supervised Learning

[Doersch et al., ICCV 2015] 24

Self-Supervised Learning

[Zhou et al., CVPR 2017] 24

What else?

Supervision Transfer Idea I

Solve easier problem and transfer labels

26

Supervision Transfer Idea I

Solve easier problem and transfer labels

I

Requires algorithms which automate this transfer

26

Supervision Transfer Idea I

Solve easier problem and transfer labels

I

Requires algorithms which automate this transfer

I

Requires additional sensor modalities

26

Supervision Transfer Idea I

Solve easier problem and transfer labels

I

Requires algorithms which automate this transfer

I

Requires additional sensor modalities

Examples I

Domain transfer (e.g., 3D to 2D)

I

Resolution transfer (e.g., high resolution to low resolution)

I

Content transfer (e.g., 3D models to real world scenes)

26

Supervision Transfer Idea I

Solve easier problem and transfer labels

I

Requires algorithms which automate this transfer

I

Requires additional sensor modalities

Examples I

Domain transfer (e.g., 3D to 2D)

I

Resolution transfer (e.g., high resolution to low resolution)

I

Content transfer (e.g., 3D models to real world scenes)

26

Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer [Xie, Kiefel, Sun & Geiger, CVPR 2016]

The Computer Vision Revolution

[Long et al., 2015] 28

3D to 2D Semantic and Instance Label Transfer Side View

Front View

3D View

29

3D to 2D Semantic and Instance Label Transfer Side View

3D View

Front View

Geometric Cues

3D Points

Labels

3D 2D

Dense Segmentation

29

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

30

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

I

A single annotated 3D object projects into many frames

30

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

I

A single annotated 3D object projects into many frames

I

2D annotations are temporally coherent

30

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

I

A single annotated 3D object projects into many frames

I

2D annotations are temporally coherent

Challenges: I

3D data is sparse, noisy and incomplete

30

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

I

A single annotated 3D object projects into many frames

I

2D annotations are temporally coherent

Challenges: I

3D data is sparse, noisy and incomplete

I

3D annotations are coarse

30

3D to 2D Semantic and Instance Label Transfer Advantages over 2D annotation: I

Object instances can be more easily separated in 3D

I

A single annotated 3D object projects into many frames

I

2D annotations are temporally coherent

Challenges: I

3D data is sparse, noisy and incomplete

I

3D annotations are coarse

I

Dynamic objects

30

3D Annotation: Static Scene Elements

I

Map prior for localizing buildings 31

3D Annotation: Static Scene Elements

I

Map prior for localizing buildings 31

3D Annotation: Detecting Dynamic Objects

32

3D Annotation: Detecting Dynamic Objects

32

3D Annotation: Annotating Dynamic Objects

33

2D Annotation: Scribbling the Rest

34

Model Variables: I

Pixels: {si }i∈P

Pixels Image

35

Model 3D Points

Variables: I

Pixels: {si }i∈P

I

3D points: {sl }l∈L Pixels Image

35

Model 3D Points

Variables: I

Pixels: {si }i∈P

I

3D points: {sl }l∈L

I

Scribbled pixels: {sj }j∈P 0

Pixels Image Scribbled Pixels

35

Model Variables: I

Pixels: {si }i∈P

I

3D points: {sl }l∈L

I

Scribbled pixels: {sj }j∈P 0

Gibbs Energy: X X X XX 0 ϕF E(s) = ϕP ϕL ϕP j (sj ) + mi (si ) i (si ) + l (sl ) + i∈P P,P + ψij (si , sj ) i,j∈P

X

l∈L

j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0 35

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

l∈L

P,P + ψij (si , sj ) i,j∈P

X

X

X j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

0

ϕP j (sj ) +

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

Pixel Unary Potentials: P P ϕP i (si ) = w1 (si ) ξi (si ) I

ξiP (si ): admissible labels

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

l∈L

P,P + ψij (si , sj ) i,j∈P

X

X

X j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

0

ϕP j (sj ) +

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

0

P,P ψij (si , sj )

X

i∈P,j∈P 0

Pixel Unary Potentials: P P P P ϕP i (si ) = w1 (si ) ξi (si ) − w2 (si ) log pi (si ) I

ξiP (si ): admissible labels

I

pP i (si ): local appearance Training (Sparse) Test (Dense) 36

Potentials E(s) =

X

ϕP i (si ) +

i∈P P,P ψij (si , sj ) + i,j∈P

X

X l∈L

ϕL l (sl ) +

X j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

0

ϕP j (sj ) +

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

3D Point Unary Potentials: L L ϕL l (sl ) = w (sl ) ξl (sl )

I

ξlL (sl ) = 0 ⇔ 3D point lies within a 3D primitive of class sl

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P P,P ψij (si , sj ) + i,j∈P

X

X

ϕL l (sl ) +

l∈L

X j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

0

ϕP j (sj ) +

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

3D Point Unary Potentials: L L ϕL l (sl ) = w (sl ) ξl (sl )

I

ξlL (sl ) = 0 ⇔ 3D point lies within a 3D primitive of class sl

I

Sky not admissible 36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

Sparsely Labeled Pixel Unary Potentials: 0

0

0

P P ϕP i (si ) = w (si ) ξi (si )

Key frame 36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

Sparsely Labeled Pixel Unary Potentials: 0

0

0

P P ϕP i (si ) = w (si ) ξi (si )

Key frame

Non-key frame 36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

X j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

0

ϕP j (sj ) +

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

0

P,P ψij (si , sj )

X

i∈P,j∈P 0

Geometric Unary Potentials: F ϕF mi (si ) = w (si ) [pi ∈ Rm ∧ νm (pi ) 6= si ]

Curb

Fold Side View

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

0

P,P ψij (si , sj )

X

i∈P,j∈P 0

Geometric Unary Potentials: F ϕF mi (si ) = w (si ) [pi ∈ Rm ∧ νm (pi ) 6= si ] Minimum Bounding Disc

Wall Road

Sidewalk

A

2D Fold

B Label Boundary 36

Potentials ϕP i (si ) +

X

P,P ψij (si , sj ) +

X

E(s) =

X i∈P

+

X

i,j∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

ϕF mi (si )

m∈F i∈P

L,L ψlk (sl , sk ) +

l,k∈L

XX

X

ψilP,L (si , sl ) +

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

i∈P,l∈L

Pixel Pairwise Potentials: ( P,P ψij (si , sj )

=

w1P,P (si , sj ) exp

− (

+ I

w2P,P (si , sj ) exp

−

kpi − pj k2

)

2 θ1P,P kpi − pj k2 2 θ2P,P

−

kci − cj k2

)

2 θ3P,P

Fully connected model with appearance and smoothness kernel 36

Potentials ϕP i (si ) +

X

P,P ψij (si , sj ) +

X

E(s) =

X i∈P

+

X

i,j∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

ϕF mi (si )

m∈F i∈P

L,L ψlk (sl , sk ) +

l,k∈L

XX

X

ψilP,L (si , sl ) +

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

i∈P,l∈L

Pixel Pairwise Potentials: ( P,P ψij (si , sj )

=

w1P,P (si , sj ) exp

− (

+

w2P,P (si , sj ) exp

−

kpi − pj k2

)

2 θ1P,P kpi − pj k2 2 θ2P,P

−

kci − cj k2

)

2 θ3P,P

I

Fully connected model with appearance and smoothness kernel

I

Gaussian potentials ensure tractable inference [Kr¨ahenb¨uhl & Koltun, NIPS 2011] 36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

X

0

ϕP j (sj ) +

j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L

+

(

(nl − nk )2

X

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

3D Pairwise Potentials: L,L ψlk (sl , sk )

I

=w

L,L

(sl , sk ) exp −

3d 2 kp3d l − pk k

2 θ1L,L

−

)

2 θ2L,L

Fully connected model in 3D

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

X

0

ϕP j (sj ) +

j∈P 0

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L

+

(

(nl − nk )2

X

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

3D Pairwise Potentials: L,L ψlk (sl , sk )

=w

L,L

(sl , sk ) exp −

3d 2 kp3d l − pk k

I

Fully connected model in 3D

I

Encourages same labels for closeby 3D points with similar normals

2 θ1L,L

−

)

2 θ2L,L

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

2D/3D Pairwise Potentials: ψilP,L (si , sl )

I

=w

P,L

kpi − π l k2 (si , sl ) exp − 2 θP,L

Fully connected 2D/3D ﬁeld

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

2D/3D Pairwise Potentials: ψilP,L (si , sl )

=w

P,L

kpi − π l k2 (si , sl ) exp − 2 θP,L

I

Fully connected 2D/3D ﬁeld

I

Encourages label consistency in a neighborhood of the projection π l

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

2D Scribbled Pairwise Potentials: P,P 0 ψij (si , sj )

I

( =w

P,P 0

(si , sj ) exp −

2 kpi − pL jk

)

2 θP,P 0

Fully connected 2D/sparse 2D ﬁeld

36

Potentials E(s) =

X

ϕP i (si ) +

i∈P

ϕL l (sl ) +

X

0

ϕP j (sj ) +

j∈P 0

l∈L

P,P ψij (si , sj ) + i,j∈P

X

X

L,L + ψlk (sl , sk ) l,k∈L

X

XX

ϕF mi (si )

m∈F i∈P

+ ψilP,L (si , sl ) i∈P,l∈L X

+

X

0

P,P ψij (si , sj )

i∈P,j∈P 0

2D Scribbled Pairwise Potentials: P,P 0 ψij (si , sj )

( =w

P,P 0

(si , sj ) exp −

I

Fully connected 2D/sparse 2D ﬁeld

I

Propagate sparse annotation of pL j to its neighborhood

2 kpi − pL jk

)

2 θP,P 0

36

Learning and Inference Inference: I

Factorized mean ﬁeld Q(s) =

Q

i∈P∪L Qi (si )

37

Learning and Inference Inference: Q

I

Factorized mean ﬁeld Q(s) =

I

Efﬁcient variational inference [Kr¨ahenb¨uhl & Koltun, CVPR 2011]

i∈P∪L Qi (si )

37

Learning and Inference Inference: Q

I

Factorized mean ﬁeld Q(s) =

I

Efﬁcient variational inference [Kr¨ahenb¨uhl & Koltun, CVPR 2011]

i∈P∪L Qi (si )

Learning: I

Θ = {w1P , w2P , wL , wF , w1P,P , w2P,P , wP,L , w1L,L , w2L,L }

37

Learning and Inference Inference: Q

I

Factorized mean ﬁeld Q(s) =

I

Efﬁcient variational inference [Kr¨ahenb¨uhl & Koltun, CVPR 2011]

i∈P∪L Qi (si )

Learning: I I

Θ = {w1P , w2P , wL , wF , w1P,P , w2P,P , wP,L , w1L,L , w2L,L } Empirical risk minimization (univariate logistic loss) N X X f (Θ) = − log Qn,i (s∗n,i ) + λ C(Θ) n=1 i∈P I

s∗n,i : ground truth label

Qn,i (·): approximate marginal

37

Learning and Inference Inference: Q

I

Factorized mean ﬁeld Q(s) =

I

Efﬁcient variational inference [Kr¨ahenb¨uhl & Koltun, CVPR 2011]

i∈P∪L Qi (si )

Learning: I I

Θ = {w1P , w2P , wL , wF , w1P,P , w2P,P , wP,L , w1L,L , w2L,L } Empirical risk minimization (univariate logistic loss) N X X f (Θ) = − log Qn,i (s∗n,i ) + λ C(Θ) n=1 i∈P I

I

s∗n,i : ground truth label

Qn,i (·): approximate marginal

Stochastic gradient descent 37

Learning and Inference Inference: Q

I

Factorized mean ﬁeld Q(s) =

I

Efﬁcient variational inference [Kr¨ahenb¨uhl & Koltun, CVPR 2011]

i∈P∪L Qi (si )

Learning: I I

Θ = {w1P , w2P , wL , wF , w1P,P , w2P,P , wP,L , w1L,L , w2L,L } Empirical risk minimization (univariate logistic loss) N X X f (Θ) = − log Qn,i (s∗n,i ) + λ C(Θ) n=1 i∈P I

s∗n,i : ground truth label

Qn,i (·): approximate marginal

I

Stochastic gradient descent

I

Same loss for instance & semantic segmentation 37

Quantitative Results Method LA LA+PW LA+PW+CO+3D Full Model Full Model (90%) Full Model (80%) Full Model (70%)

JI

Acc

82.1 84.4 88.2 89.0

90.0 91.4 93.7 94.1

94.9 97.4 96.6 98.2 97.5 98.7

I

LA: Local Appearance

I

PW: 2D Pairwise Potentials

I

CO: 3D Primitive Constraints

I

3D: 3D Point Constraints 38

Qualitative Comparison to Baselines

2D Label Propagation [Vijayanarasimhan et al., 2012]

39

Qualitative Comparison to Baselines

Projection of 3D Primitives

39

Qualitative Comparison to Baselines

Proposed Method

39

Qualitative Results

40

Qualitative Results

40

Qualitative Results

40

Qualitative Results

40

Qualitative Results

41

Qualitative Results

41

Qualitative Results for Dynamic Scenes

42

Qualitative Results for Dynamic Scenes

42

Qualitative Results for Dynamic Scenes

42

Dataset Statistics

Dataset

#frames

semantic

CamVid

631 × 1 500 × 2 5000 × 1 20000 × 1 55218 × 2

X X X X X

DUS CityScape (ﬁne) CityScape (coarse) Ours

instance

X X X

consecutive 3D annotation

X X

?

X

X

43

Video

44

Supervision Transfer Idea I

Transfer labels from a problem for which they are easy to obtain

I

Develop algorithms which facilitate this transfer

Examples I

Domain transfer (e.g., 3D to 2D)

I

Resolution transfer (e.g., high resolution to low resolution)

I

Content transfer (e.g., 3D models to real world scenes)

45

Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data [Janai, G¨uney, Wulff, Black & Geiger, CVPR 2017]

Slow Flow

47

Slow Flow

47

Slow Flow

47

Slow Flow

48

Slow Flow

49

Slow Flow

50

Slow Flow

51

Sparsity Invariant CNNs [Uhrig, Schneider, Franke, Brox & Geiger, 3DV 2017]

Laser Scans are Sparse

I

Goal: Interpolation of sparse / irregular depth map

53

Standard CNNs fail on Sparse Inputs

Input (5% sparsity)

Ground truth

Standard ConvNet

SparseConvNet

54

Sparse Convolutions I

Regular Convolution: fu,v (x) =

k X

xu+i,v+j wi,j + b

i,j=−k

x : Input

wi,j : Filter weights

b: Bias 55

Sparse Convolutions I

Regular Convolution: fu,v (x) =

k X

xu+i,v+j wi,j + b

i,j=−k

I

Sparse Convolution: Pk

i,j=−k ou+i,v+j xu+i,v+j Pk i,j=−k ou+i,v+j +

fu,v (x, o) = o fu,v (o) =

x : Input

max

i,j=−k,..,k

wi,j : Filter weights

wi,j

+b

ou+i,v+j

b: Bias

o : Observability 55

Sparse Convolution Module

feature

mask

1 0 -2 2 -1 2 1 2 3

weights

feature

1 1 1 1 1 1 1 1 1

normalization

max pool

I

bias

mask

Can be easily implemented using regular operations 56

Network Architecture

16 1

16 1

16 1

I

Standard 2D denoising CNN architecture

I

Yellow: Depth image (input/output)

I

Red: Observation mask

I

Green: Feature maps

16 1

h

w 1

16 1

mask

w

max pool: 3

h

3

h

3

h

max pool: 3

w

max pool: 5

w

w 5

h

7

h

h

11

11

max pool: 7

w

w

max pool: 11

1

57

Results on Synthia

I

Results on Synthia dataset [Ros et al., CVPR 2016]

I

Trained and evaluated at 5% sparsity

58

Results on Synthia

I

Results on Synthia dataset [Ros et al., CVPR 2016]

I

Trained at 5% sparsity and evaluated at 20% sparsity 59

KITTI Depth Dataset

I I

KITTI Depth dataset: 93k images with depth ground truth Depth prediction/completion benchmark available soon! 60

Results on Synthia-to-KITTI Depth Adaptation

ConvNet

ConvNet + mask

Sparsity at train: ConvNet ConvNet+mask SparseConvNet

5%

10%

SparseConvNet 20%

30%

40%

50%

60%

70%

16.03 13.48 10.97 8.437 10.02 9.73 9.57 9.90 16.18 16.44 16.54 16.16 15.64 15.27 14.62 14.11 0.722 0.723 0.732 0.734 0.733 0.731 0.731 0.730

MAE (m) I

Trained on Synthia with different sparsity levels

I

Evaluated on KITTI Depth dataset 61

Supervision Transfer Idea I

Transfer labels from a problem for which they are easy to obtain

I

Develop algorithms which facilitate this transfer

Examples I

Domain transfer (e.g., 3D to 2D)

I

Resolution transfer (e.g., high resolution to low resolution)

I

Content transfer (e.g., 3D models to real world scenes)

62

Augmented Reality Meets Deep Learning for Car Instance Segmentation in Urban Scenes [Alhaija, Mustikovela, Mescheder, Geiger & Rother, BMVC 2017]

Augmented Reality Meets Deep Learning

64

Augmented Reality Meets Deep Learning

64

Augmented Reality Meets Deep Learning

I

Instance segmentation performance (MNC, Dai et al., CVPR 2016) 65

Augmented Reality Meets Deep Learning

65

Augmented Reality Meets Deep Learning

65

One more thing ...

A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos [Sch¨ops, Sch¨onberger, Galliani, Sattler, Schindler, Pollefeys & Geiger, CVPR 2017]

ETH3D Benchmark

www.eth3d.net 68

Thank you!

Generating Training Data for Deep Neural Networks by exploiting [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch