Whitening Sentence Representations for Better Semantics and Faster Retrieval

0
Structured data

A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...

Read on arXivCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

One Sentence Abstract

This study explores enhancing BERT-based sentence representation isotropy via whitening operation, achieving competitive results, reducing storage costs, and accelerating model retrieval speed.

Simplified Abstract

Scientists have developed a new way to improve a powerful tool called BERT, which helps computers understand language. BERT is really good at many tasks, but it has a problem: it sometimes struggles to see the full meaning of a sentence. This is because the sentences it works with aren't evenly distributed, which makes it hard for BERT to use all the important meanings.

To fix this, the researchers tried a technique called 'whitening.' Whitening is like making sure all the colors in a box of crayons are evenly distributed. By doing this, BERT can better understand the meaning of sentences and even reduce the amount of space needed to store the information. This makes the tool work faster and use less space.

The researchers found that using whitening can help BERT understand language even better and faster than some other methods. This new approach can be really helpful in improving scientific collaborations between countries by making it easier for computers to understand and work with scientific documents.

Study Fields

Main fields:

  • Natural language processing (NLP)
  • Pre-training models
  • Sentence representation
  • Anisotropy problem

Subfields:

  • BERT-based sentence representation
  • Boosting isotropy
  • Flow-based model
  • Whitening operation
  • Dimensionality reduction
  • Model retrieval speed
  • Storage cost reduction

Study Objectives

  • Investigate how to obtain better sentence representation through pre-training models like BERT
  • Analyze the anisotropy problem in BERT-based sentence representation and its impact on model performance
  • Evaluate attempts to boost isotropy of sentence distribution, such as flow-based models, and explore the potential of whitening operation in traditional machine learning for this purpose
  • Compare the performance, storage cost, and model retrieval speed of whitening technique on BERT-based sentence representation

Conclusions

  • The anisotropy problem is a critical bottleneck for BERT-based sentence representation, hindering the model from fully utilizing underlying semantic features.
  • Applying a flow-based model has shown improvement in some attempts to boost isotropy in sentence representations.
  • The whitening operation in traditional machine learning can enhance the isotropy of sentence representations and achieve competitive results.
  • Whitening technique can also reduce the dimensionality of sentence representation, leading to improved performance, reduced storage cost, and accelerated model retrieval speed.
  • The paper's source code is available at https://github.com/bojone/BERT-whiteningCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat..

References

Jacob. Devlin, Ming-Wei. Chang, Kenton. Lee, Kristina. Toutanova
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Matthew. Peters, Mark. Neumann, Mohit. Iyyer, Matt. Gardner, Christopher. Clark, Kenton. Lee, Luke. Zettlemoyer
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
Alec. Radford, Jeffrey. Wu, Rewon. Child, David. Luan, Dario. Amodei, Ilya. SutskeverOpenAI blog
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
B. Tom, Benjamin. Brown, Nick. Mann, Melanie. Ryder, Jared. Subbiah, Prafulla. Kaplan, Arvind. Dhariwal, Pranav. Neelakantan, Girish. Shyam, Amanda. Sastry, Sandhini. Askell, Ariel. Agarwal, Gretchen. Herbert-Voss, Tom. Krueger, Rewon. Henighan, Aditya. Child, Daniel.M. Ramesh, Jeffrey. Ziegler, Clemens. Wu, Christopher. Winter, Mark. Hesse, Eric. Chen, Mateusz. Sigler, Scott. Litwin, Benjamin. Gray, Jack. Chess, Christopher. Clark, Sam. Berner, Alec. Mccandlish, Ilya. Radford, Dario. Sutskever, None. Amodei
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
Sanjeev. Arora, Yingyu. Liang, Tengyu. Ma
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
Bohan. Li, Hao. Zhou, Junxian. He, Mingxuan. Wang, Yiming. Yang, Lei. Li
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9119–9130, Online. Association for Computational Linguistics.
Jun. Gao, Di. He, Xu. Tan, Tao. Qin, Liwei. Wang, Tie-Yan. Liu
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019. Representation degeneration problem in training natural language generation models. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
Kawin. Ethayarajh
Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55–65, Hong Kong, China. Association for Computational Linguistics.
Laurent. Dinh, David. Krueger, Yoshua. Bengio
Laurent Dinh, David Krueger, and Yoshua Bengio. 2014. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516.
Nils. Reimers, Iryna. Gurevych
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
Jiaqi. Mu, Pramod. Viswanath
Jiaqi Mu and Pramod Viswanath. 2018. All-but-the-top: Simple and effective postprocessing for word representations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
Alexis. Conneau, Douwe. Kiela, Holger. Schwenk, Loïc. Barrault, Antoine. Bordes
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark. Association for Computational Linguistics.
Daniel. Cer, Mona. Diab, Eneko. Agirre, Iñigo. Lopez-Gazpio, Lucia. Specia
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, Vancouver, Canada. Association for Computational Linguistics.
Yinfei. Yang, Steve. Yuan, Daniel. Cer, Sheng-Yi. Kong, Noah. Constant, Petr. Pilar, Heming. Ge, Yun-Hsuan. Sung, Brian. Strope, Ray. Kurzweil
Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning semantic textual similarity from conversations. In Proceedings of The Third Workshop on Representation Learning for NLP, pages 164–174, Melbourne, Australia. Association for Computational Linguistics.
<FootnoteDefini...

References

Unlock full article access by joining Solve