DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...
Read on arXiv
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
One Sentence Abstract
This work introduces "DeepLab" for semantic image segmentation using atrous convolution and spatial pyramid pooling, and combines DCNNs with CRFs to improve localization, resulting in state-of-the-art performance on multiple datasets.
Simplified Abstract
This research focuses on improving image segmentation using a technique called Deep Learning. It introduces three main improvements that make it more accurate and effective.
First, it introduces 'atrous convolution,' a method that helps control the detail of features in images. This allows the computer to see more details in images without increasing its workload or size.
Second, it proposes a method called 'atrous spatial pyramid pooling' (ASPP) to find objects in images more precisely at various scales. This method probes images at different levels of detail, similar to how our eyes see things in a broader or narrower context.
Lastly, it improves the accuracy of finding object boundaries by combining two techniques: Deep Convolutional Neural Networks (DCNNs) and probabilistic graphical models. While DCNNs help the computer understand the overall image, the probabilistic graphical models refine the details, making it better at finding object edges.
This new approach, named "DeepLab," sets a new record for image segmentation on different datasets, including PASCAL VOC-2012, PASCAL-Context, PASCAL-Person-Part, and Cityscapes. It also makes the research code publicly available, allowing others to build upon this advancement.
Study Fields
Main fields:
- Deep Learning
- Semantic image segmentation
Subfields:
- Atrous convolution
- Atrous spatial pyramid pooling (ASPP)
- Deep Convolutional Neural Networks (DCNNs)
- Conditional Random Field (CRF)
- PASCAL VOC-2012 semantic image segmentation task
- PASCAL-Context
- PASCAL-Person-Part
- Cityscapes
- Code availability
Study Objectives
- Investigate the use of deep learning for semantic image segmentation
- Highlight the usefulness of atrous convolution as a powerful tool in dense prediction tasks
- Propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales
- Improve localization of object boundaries by combining methods from DCNNs and probabilistic graphical models
- Develop a "DeepLab" system that sets new state-of-art performance in semantic image segmentation tasks
- Evaluate the performance of the proposed method on various datasets, including PASCAL VOC-2012, PASCAL-Context, PASCAL-Person-Part, and Cityscapes
Conclusions
- Introduce atrous convolution, a powerful tool for dense prediction tasks that allows controlling the resolution and enlarging the field of view of filters without increasing parameters or computation.
- Propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and field-of-views.
- Improve localization of object boundaries by combining methods from DCNNs and probabilistic graphical models, using the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF) to enhance localization performance.
- Develop "DeepLab" system that sets new state-of-the-art at PASCAL VOC-2012 semantic image segmentation task (79.7% mIOU in test set) and improves results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes.
- Share all the code publicly to encourage further research and development.
References
- University of AI
Received 20 Oct 2011, Revised 9 Dec 2011, Accepted 5 Jan 2012, Available online 12 Jan 2012.





