K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

0
Structured data

A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...

Read on arXivCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

One Sentence Abstract

K-Adapter, a framework for injecting multiple types of knowledge into pre-trained models like RoBERTa, retains original parameters fixed and uses neural adapters as plug-ins, demonstrating improved performance in relation classification, entity typing, and question answering tasks.

Simplified Abstract

Researchers are working on improving a tool called BERT, which is used to make machines understand language better. However, when BERT is asked to learn new things, it sometimes forgets what it previously learned. To solve this problem, the researchers developed a new approach called K-Adapter.

K-Adapter is like a toolbox that adds extra parts to BERT when it needs to learn new things. These extra parts, or "adapters," work like plug-ins that can be connected to BERT without affecting its original parts. This way, BERT can learn new things without forgetting what it already knew.

In this study, the researchers used K-Adapter to help BERT learn two different types of knowledge: factual knowledge from Wikipedia and linguistic knowledge from how words are connected in sentences.

The results showed that when BERT used K-Adapter, it performed better in tasks like understanding relationships between words, identifying the type of an entity, and answering questions. The researchers also found that K-Adapter helps BERT learn a wider range of knowledge than before.

K-Adapter's code is now available for others to use and build on, which could lead to even better and more accurate language understanding tools in the future.

Study Fields

Main fields:

  • Natural Language Processing (NLP)
  • Knowledge Infusion
  • Pre-trained Models

Subfields:

  • K-Adapter Framework Development
  • RoBERTa as Backbone Model
  • Neural Adapters for Infused Knowledge
  • Distributed Training
  • Factual Knowledge (from text-triplets on Wikipedia and Wikidata)
  • Linguistic Knowledge (via dependency parsing)
  • Relation Classification
  • Entity Typing
  • Question Answering
  • Performance Improvements
  • Knowledge Versatility
  • Code Availability

Study Objectives

  • Investigate the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa
  • Evaluate existing methods that update the original parameters of pre-trained models when injecting knowledge
  • Address the issue of flushed away historically injected knowledge when multiple kinds of knowledge are injected
  • Propose K-Adapter, a framework that retains the original parameters of pre-trained models and supports versatile knowledge-infused models
  • Utilize RoBERTa as the backbone model and develop neural adapters for each kind of infused knowledge
  • Train adapters efficiently in a distributed way with no information flow between them
  • Inject two kinds of knowledge in a case study: factual knowledge from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge from dependency parsing
  • Evaluate K-Adapter's performance on three knowledge-driven tasks: relation classification, entity typing, and question answering
  • Demonstrate that each adapter improves performance and the combination of both adapters brings further improvements
  • Analyze that K-Adapter captures versatile knowledge compared to RoBERTa
  • Share 111Codes publicly at https://github.com/microsoft/k-adapterCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Conclusions

  • The paper addresses the issue of multiple kinds of knowledge being injected into pre-trained models like BERT and RoBERTa, causing historically injected knowledge to be lost.
  • The authors propose K-Adapter, a framework that fixes the original parameters of the pre-trained model and uses a neural adapter for each kind of infused knowledge, allowing efficient training in a distributed way.
  • As a case study, the authors inject two types of knowledge: factual knowledge from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge via dependency parsing.
  • K-Adapter's results on three knowledge-driven tasks show that each adapter improves performance, and their combination leads to further improvements.
  • The authors also find that K-Adapter captures more versatile knowledge than the base RoBERTa model.
  • The code for K-Adapter is available publicly at https://github.com/microsoft/k-adapterCardiologyLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat..

References

Jacob. Devlin, Ming-Wei. Chang, Kenton. Lee, Kristina. Toutanova
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186.
Alec. Radford, Karthik. Narasimhan, Tim. Salimans, Ilya. SutskeverOpenAI Blog
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI Blog.
Alec. Radford, Jeffrey. Wu, Rewon. Child, David. Luan, Dario. Amodei, Ilya. SutskeverOpenAI Blog
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
Zhilin. Yang, Zihang. Dai, Yiming. Yang, Jaime. Carbonell, Ruslan. Salakhutdinov, Quoc V. Le
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237.
Yinhan. Liu, Myle. Ott, Naman. Goyal, Jingfei. Du, Mandar. Joshi, Danqi. Chen, Omer. Levy, Mike. Lewis, Luke. Zettlemoyer, Veselin. Stoyanov
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Colin. Raffel, Noam. Shazeer, Adam. Roberts, Katherine. Lee, Sharan. Narang, Michael. Matena, Yanqi. Zhou, Wei. Li, Peter.J. Liu
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
Nina. Poerner, Ulli. Waltinger, Hinrich. Schütze
Nina Poerner, Ulli Waltinger, and Hinrich Schütze. 2019. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA. arXiv preprint arXiv:1911.03681.
Nora. Kassner, Hinrich. Schütze
Nora Kassner and Hinrich Schütze. 2019. Negated LAMA: Birds cannot fly. arXiv preprint arXiv:1911.03343.
Zhengyan. Zhang, Xu. Han, Zhiyuan. Liu, Xin. Jiang, Maosong. Sun, Qun. Liu
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In ACL, pages 1441–1451.
Anne. Lauscher, Ivan. Vulić, Maria. Edoardo, Anna. Ponti, Goran. Korhonen, None. Glavaš
Anne Lauscher, Ivan Vulić, Edoardo Maria Ponti, Anna Korhonen, and Goran Glavaš. 2019. Informing unsupervised pretraining with external linguistic knowledge. arXiv preprint arXiv:1909.02339.
Yoav. Levine, Barak. Lenz, Or. Dagan, Dan. Padnos, Or. Sharir, Shai. Shalev-Shwartz, Amnon. Shashua, Yoav. Shoham
Yoav Levine, Barak Lenz, Or Dagan, Dan Padnos, Or Sharir, Shai Shalev-Shwartz, Amnon Shashua, and Yoav Shoham. 2019. Sensebert: Driving some sense into bert. arXiv preprint arXiv:1908.05646.
Mark. Matthew E Peters, None. Neumann, L. Logan, Roy. Robert, Vidur. Schwartz, Sameer. Joshi, Noah.A. Singh, None. Smith
Matthew E Peters, Mark Neumann, IV Logan, L Robert, Roy Schwartz, Vidur Joshi, Sameer Singh, and Noah A Smith. 2019. Knowledge enhanced contextual word representations. In EMNLP, pages 43–54.
Wenhan. Xiong, Jingfei. Du, William. Yang, Wang. None, Veselin. Stoyanov
Wenhan Xiong, Jingfei Du, William Yang Wang, and Veselin Stoyanov. 2020. Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model. In ICLR.
Bin. He, Di. Zhou, Jinghui. Xiao, Qun. Liu, Nicholas. Jing Yuan, Tong. Xu
Bin He, Di Zhou, Jinghui Xiao, Qun Liu, Nicholas Jing Yuan, Tong Xu, et al. 2019. Integrating Graph Contextualized Knowledge into Pre-trained Language Models. In arXiv preprint arXiv:1912.00147.
James. Kirkpatrick, Razvan. Pascanu, Neil. Rabinowitz, Joel. Veness, Guillaume. Desjardins, Andrei.A. Rusu, Kieran. Milan, John. Quan, Tiago. Ramalho, Agnieszka. Grabska-BarwinskaProceedings of the National Academy of Sciences
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, and et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526.
Antoine. Bordes, Nicolas. Usunier, Alberto. Garcia-Duran, Jason. Weston, Oksana. Yakhnenko
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIP...

References

Unlock full article access by joining Solve