DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

0
Structured data

A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...

Read on arXivCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

One Sentence Abstract

This work introduces "DeepLab" for semantic image segmentation using atrous convolution and spatial pyramid pooling, and combines DCNNs with CRFs to improve localization, resulting in state-of-the-art performance on multiple datasets.

Simplified Abstract

This research focuses on improving image segmentation using a technique called Deep Learning. It introduces three main improvements that make it more accurate and effective.

First, it introduces 'atrous convolution,' a method that helps control the detail of features in images. This allows the computer to see more details in images without increasing its workload or size.

Second, it proposes a method called 'atrous spatial pyramid pooling' (ASPP) to find objects in images more precisely at various scales. This method probes images at different levels of detail, similar to how our eyes see things in a broader or narrower context.

Lastly, it improves the accuracy of finding object boundaries by combining two techniques: Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models. While DCNNs help the computer understand the overall image, the probabilistic graphical models refine the details, making it better at finding object edges.

This new approach, named "DeepLab," sets a new record for image segmentation on different datasets, including PASCAL VOC-2012, PASCAL-Context, PASCAL-Person-Part, and Cityscapes. It also makes the research code publicly available, allowing others to build upon this advancement.

Study Fields

Main fields:

  • Deep Learning
  • Semantic image segmentation

Subfields:

  • Atrous convolution
  • Atrous spatial pyramid pooling (ASPP)
  • Deep Convolutional Neural Networks (DCNNs)
  • Conditional Random Field (CRF)
  • PASCAL VOC-2012 semantic image segmentation task
  • PASCAL-Context
  • PASCAL-Person-Part
  • Cityscapes
  • Code availability

Study Objectives

  • Investigate the use of deep learning for semantic image segmentation
  • Highlight the usefulness of atrous convolution as a powerful tool in dense prediction tasks
  • Propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales
  • Improve localization of object boundaries by combining methods from DCNNs and probabilistic graphical models
  • Develop a "DeepLab" system that sets new state-of-art performance in semantic image segmentation tasks
  • Evaluate the performance of the proposed method on various datasets, including PASCAL VOC-2012, PASCAL-Context, PASCAL-Person-Part, and Cityscapes

Conclusions

  • Introduce atrous convolution, a powerful tool for dense prediction tasks that allows controlling the resolution and enlarging the field of view of filters without increasing parameters or computation.
  • Propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and field-of-views.
  • Improve localization of object boundaries by combining methods from DCNNs and probabilistic graphical models, using the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF) to enhance localization performance.
  • Develop "DeepLab" system that sets new state-of-the-art at PASCAL VOC-2012 semantic image segmentation task (79.7% mIOU in test set) and improves results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes.
  • Share all the code publicly to encourage further research and development.

References

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proc. IEEE, 1998.
A. Krizhevsky, I. Sutskever, G.E. Hinton
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2013.
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. Lecun
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv:1312.6229, 2013.
K. Simonyan, A. Zisserman
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842, 2014.
G. Papandreou, I. Kokkinos, P.-A. Savalle
G. Papandreou, I. Kokkinos, and P.-A. Savalle, “Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection,” in CVPR, 2015.
R. Girshick, J. Donahue, T. Darrell, J. Malik
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in CVPR, 2014.
D. Erhan, C. Szegedy, A. Toshev, D. Anguelov
D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in CVPR, 2014.
R. Girshick
R. Girshick, “Fast r-cnn,” in ICCV, 2015.
S. Ren, K. He, R. Girshick, J. Sun
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NIPS, 2015.
K. He, X. Zhang, S. Ren, J. Sun
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD: Single shot multibox detector,” arXiv:1512.02325, 2015.
M.D. Zeiler, R. Fergus
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in ECCV, 2014.
J. Long, E. Shelhamer, T. Darrell
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
M. Holschneider, R. Kronland-Martinet, J. Morlet, P. Tchamitchian
M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Wavelets: Time-Frequency Methods and Phase Space, 1989, pp. 289–297.
A. Giusti, D. Ciresan, J. Masci, L. Gambardella, J. Schmidhuber
A. Giusti, D. Ciresan, J. Masci, L. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in ICIP, 2013.
L.-C. Chen, Y. Yang, J. Wang, W. Xu, A.L. Yuille
L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, “Attention to scale: Scale-aware semantic image segmentation,” in CVPR, 2016.
I. KokkinosICLR
I. Kokkinos, “Pushing the boundaries of boundary detection using deep learning,” in ICLR, 2016.
S. Lazebnik, C. Schmid, J. Ponce
S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in CVPR, 2006.
K. He, X. Zhang, S. Ren, J. Sun
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in ECCV, 2014.
B. Hariharan, P. Arbeláez, R. Girshick, J. Malik
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in CVPR, 2015.
P. Krähenbühl, V. Koltun
P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in NIPS, 2011.
C. Rother, V. Kolmogorov, A. Blake
C. Rother, V. Kolmogorov, and A. Blake, “GrabCut: Interactive foreground extraction using iterated graph cuts,” in SIGGRAPH, 2004.
J. Shotton, J. Winn, C. Rother, A. CriminisiIJCV
J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context,” IJCV, 2009.
A. Lucchi, Y. Li, X. Boix, K. Smith, P. Fua
A. Lucchi, Y. Li, X. Boix, K. Smith, and P. Fua, “Are spatial and global constraints really necessary for segmentation?” in ICCV, 2011.
X. He, R.S. Zemel, M. Carreira-Perpindn
X. He, R. S. Zemel, and M. Carreira-Perpindn, “Multiscale conditional random fields for image labeling,” i...

References

Unlock full article access by joining Solve