K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Structured data

A study conducted in the UK from 2009 to 2010 by leading scientists explored neonatal resuscitation practices in various neonatal units, aiming to assess adherence to international guidelines and identify differences between tertiary and non-tertiary care providers...

Read on arXiv Cardiology Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

One Sentence Abstract

K-Adapter, a framework for injecting multiple types of knowledge into pre-trained models like RoBERTa, retains original parameters fixed and uses neural adapters as plug-ins, demonstrating improved performance in relation classification, entity typing, and question answering tasks.

Simplified Abstract

Researchers are working on improving a tool called BERT, which is used to make machines understand language better. However, when BERT is asked to learn new things, it sometimes forgets what it previously learned. To solve this problem, the researchers developed a new approach called K-Adapter.

K-Adapter is like a toolbox that adds extra parts to BERT when it needs to learn new things. These extra parts, or "adapters," work like plug-ins that can be connected to BERT without affecting its original parts. This way, BERT can learn new things without forgetting what it already knew.

In this study, the researchers used K-Adapter to help BERT learn two different types of knowledge: factual knowledge from Wikipedia and linguistic knowledge from how words are connected in sentences.

The results showed that when BERT used K-Adapter, it performed better in tasks like understanding relationships between words, identifying the type of an entity, and answering questions. The researchers also found that K-Adapter helps BERT learn a wider range of knowledge than before.

K-Adapter's code is now available for others to use and build on, which could lead to even better and more accurate language understanding tools in the future.

Study Fields

Main fields:

Natural Language Processing (NLP)
Knowledge Infusion
Pre-trained Models

Subfields:

K-Adapter Framework Development
RoBERTa as Backbone Model
Neural Adapters for Infused Knowledge
Distributed Training
Factual Knowledge (from text-triplets on Wikipedia and Wikidata)
Linguistic Knowledge (via dependency parsing)
Relation Classification
Entity Typing
Question Answering
Performance Improvements
Knowledge Versatility
Code Availability

Study Objectives

Investigate the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa
Evaluate existing methods that update the original parameters of pre-trained models when injecting knowledge
Address the issue of flushed away historically injected knowledge when multiple kinds of knowledge are injected
Propose K-Adapter, a framework that retains the original parameters of pre-trained models and supports versatile knowledge-infused models
Utilize RoBERTa as the backbone model and develop neural adapters for each kind of infused knowledge
Train adapters efficiently in a distributed way with no information flow between them
Inject two kinds of knowledge in a case study: factual knowledge from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge from dependency parsing
Evaluate K-Adapter's performance on three knowledge-driven tasks: relation classification, entity typing, and question answering
Demonstrate that each adapter improves performance and the combination of both adapters brings further improvements
Analyze that K-Adapter captures versatile knowledge compared to RoBERTa
Share 111Codes publicly at https://github.com/microsoft/k-adapterLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Conclusions

The paper addresses the issue of multiple kinds of knowledge being injected into pre-trained models like BERT and RoBERTa, causing historically injected knowledge to be lost.
The authors propose K-Adapter, a framework that fixes the original parameters of the pre-trained model and uses a neural adapter for each kind of infused knowledge, allowing efficient training in a distributed way.
As a case study, the authors inject two types of knowledge: factual knowledge from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge via dependency parsing.
K-Adapter's results on three knowledge-driven tasks show that each adapter improves performance, and their combination leads to further improvements.
The authors also find that K-Adapter captures more versatile knowledge than the base RoBERTa model.
The code for K-Adapter is available publicly at https://github.com/microsoft/k-adapterLorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat..

References

Jacob. Devlin, Ming-Wei. Chang, Kenton. Lee, Kristina. Toutanova•

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186.

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

One Sentence Abstract

Simplified Abstract

Study Fields

Study Objectives

Conclusions

References

References

Unlock full article access by joining Solve