TextBoxes++: A Single-Shot Oriented Scene Text Detector

0
Structured data

A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...

Read on arXivCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

One Sentence Abstract

"TextBoxes++ is presented as an end-to-end trainable, fast scene text detector that achieves high accuracy and efficiency, outperforming competing methods on four public datasets, and significantly improving state-of-the-art approaches for word spotting and end-to-end text recognition tasks."

Simplified Abstract

Imagine you're trying to read the text on a photo, but the words are twisted, tiny, or just hard to see. This is a tricky problem for computers too. Scientists have developed a new tool to help computers quickly and accurately detect text in photos, no matter how it's oriented, how small it is, or how different it looks from other text.

The new tool, called TextBoxes++, is a smart way for computers to find and identify text in photos. It's much better than previous methods and can find text more quickly. When put to the test, TextBoxes++ did a great job on various types of photos, finding the text faster and more accurately than other methods.

This new tool is important because it helps computers understand the text in photos better, which can be useful in many situations, like helping you find a specific word in a photo or understanding signs and labels in real life. The scientists made this tool open-source, so others can use it and build on it to make it even better.

Study Fields

Main fields:

  • Scene text detection
  • Scene text recognition

Subfields:

  • Arbitrary orientations
  • Small sizes
  • Significantly variant aspect ratios of text in natural images
  • End-to-end trainable fast scene text detector (TextBoxes++)
  • Text localization accuracy
  • Runtime
  • Public datasets (ICDAR 2015, COCO-Text)
  • Post-processing (non-maximum suppression)
  • Word spotting
  • End-to-end text recognition

Study Objectives

  • Develop an end-to-end trainable fast scene text detector named TextBoxes++
  • Detect arbitrary-oriented scene text with high accuracy and efficiency in a single network forward pass
  • No post-processing other than an efficient non-maximum suppression involved
  • Evaluate TextBoxes++ on four public datasets
  • Outperform competing methods in terms of text localization accuracy and runtime
  • Achieve high f-measure and runtime performance on ICDAR 2015 Incidental text images and COCO-Text images
  • Combine TextBoxes++ with a text recognizer for word spotting and end-to-end text recognition tasks on popular benchmarks

Conclusions

  • The authors present an end-to-end trainable fast scene text detector named TextBoxes++ that can detect arbitrary-oriented scene text with high accuracy and efficiency.
  • The proposed method achieves text localization accuracy and runtime performance superior to competing methods on four public datasets.
  • TextBoxes++ has an f-measure of 0.817 at 11.6fps for 1024x1024 ICDAR 2015 Incidental text images, and an f-measure of 0.5591 at 19.8fps for 768x768 COCO-Text images.
  • TextBoxes++ significantly outperforms state-of-the-art approaches for word spotting and end-to-end text recognition tasks on popular benchmarks when combined with a text recognizer.
  • The code for TextBoxes++ is available at: https://github.com/MhLiao/TextBoxes_plusplusCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

References

C. Yi, Y. TianIEEE Trans. Image Processing
C. Yi and Y. Tian, “Scene text recognition in mobile applications by character descriptor and structure configuration,” IEEE Trans. Image Processing, vol. 23, no. 7, pp. 2972–2982, 2014.
B. Xiong, K. Grauman
B. Xiong and K. Grauman, “Text detection in stores using a repetition prior,” in Proc. WACV, 2016, pp. 1–9.
C. Kang, G. Kim, S.I. Yoo
C. Kang, G. Kim, and S. I. Yoo, “Detection and recognition of text embedded in online images via neural context models,” in Proc. AAAI, 2017, pp. 4103–4110.
X. Rong, C. Yi, Y. Tian
X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Proc. ECCV, 2016, pp. 109–121.
Q. Ye, D. DoermannIEEE TPAMI
Q. Ye and D. Doermann, “Text detection and recognition in imagery: A survey,” IEEE TPAMI, vol. 37, no. 7, pp. 1480–1500, 2015.
Y.-F. Pan, X. Hou, C.-L. LiuIEEE T. Image Proc
Y.-F. Pan, X. Hou, and C.-L. Liu, “A hybrid approach to detect and localize texts in natural scene images,” IEEE T. Image Proc., vol. 20, no. 3, pp. 800–813, 2011.
L. Neumann, J. Matas
L. Neumann and J. Matas, “Real-time scene text localization and recognition,” in Proc. CVPR, 2012, pp. 3538–3545.
M. Jaderberg, K. Simonyan, A. Vedaldi, A. ZissermanIJCV
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” IJCV, vol. 116, no. 1, pp. 1–20, 2016.
B. Bai, F. Yin, C.L. Liu
B. Bai, F. Yin, and C. L. Liu, “Scene text localization using gradient local correlation,” in Proc. ICDAR, 2013, pp. 1380–1384.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S.E. Reed
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed, “SSD: single shot multibox detector,” in Proc. ECCV, 2016.
S. Ren, K. He, R. Girshick, J. Sun
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Proc. NIPS, 2015.
B. Shi, X. Bai, C. YaoIEEE TPAMI
B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE TPAMI, vol. 39, no. 11, pp. 2298–2304, 2017.
M. Liao, B. Shi, X. Bai, X. Wang, W. Liu
M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. AAAI, 2017, pp. 4161–4167.
R.B. Girshick, J. Donahue, T. Darrell, J. Malik
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. CVPR, 2014.
R.B. Girshick
R. B. Girshick, “Fast R-CNN,” in Proc. ICCV, 2015.
J. Redmon, S.K. Divvala, R.B. Girshick, A. Farhadi
J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. CVPR, 2016.
J.R R. Uijlings, K.E A. Van De Sande, T. Gevers, A.W M. SmeuldersIJCV
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” IJCV, vol. 104, no. 2, pp. 154–171, 2013.
M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, B. ScholkopfIEEE Intelligent Systems and their applications
M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, “Support vector machines,” IEEE Intelligent Systems and their applications, vol. 13, no. 4, pp. 18–28, 1998.
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu
C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proc. CVPR, 2012, pp. 1083–1090.
Y. Li, W. Jia, C. Shen, A. Van Den, None. HengelIEEE Trans. Image Processing
Y. Li, W. Jia, C. Shen, and A. van den Hengel, “Characterness: An indicator of text in the wild,” IEEE Trans. Image Processing, vol. 23, no. 4, pp. 1666–1677, 2014.
W. Huang, Y. Qiao, X. Tang
W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced mser trees,” in Proc. ECCV, 2014.
L. Gomez, D. Karatzas
L. Gomez and D. Karatzas, “Multi-script text extraction from natural scenes,” in Proc. ICDAR, 2013, pp. 467–471.
Y. Guo, Y. Sun, P. Bauer, J.P. Allebach, C.A. Bouman
Y. Guo, Y. Sun, P. Bauer, J. P. Allebach, and C. A. Bouman, “Text line detection based on cost optimized local text line direction estimation,” in Proc. SPIE 9395, Color Imaging XX: Displaying, Processing, Hardcopy, and Applications, 939507, 2015.
M. Zhao, S. Li, J.T. KwokImage Vision Comput
M. Zhao, S. Li, and J. T. Kwok, “Text detection in images using sparse representation with discriminative dictionaries,” Image Vision Comput., vol. 28, no. 12, pp. 1590–1599, 2010.
Z. Zhong, L. Jin, S. Zhang, Z. FengCoRR
Z. Zhong, L. Jin, S. Zhang, and Z. Feng, “Deeptext: A unified framework for text proposal generation and text detection in natural images,” CoRR, vol. abs/1605.07314, 2016.
L. Gomez-Bigorda, D. KaratzasPattern Recognition
L. Gomez-Bigorda and D. Karatzas, “Textproposals: a text-specific selective search algorithm for word spotting in the wild,” Pattern Recognition...

References

Unlock full article access by joining Solve