LinkBERT-large model pretrained on English Wikipedia articles along with hyperlink information. It is introduced in the paper LinkBERT: Pretraining Language Models with Document Links (ACL 2022). The code and data are available in this repository.

## Model description

LinkBERT is a transformer encoder (BERT-like) model pretrained on a large corpus of documents. It is an improvement of BERT that newly captures document links such as hyperlinks and citation links to include knowledge that spans across multiple documents. Specifically, it was pretrained by feeding linked documents into the same language model context, besides a single document.

LinkBERT can be used as a drop-in replacement for BERT. It achieves better performance for general language understanding tasks (e.g. text classification), and is also particularly effective for knowledge-intensive tasks (e.g. question answering) and cross-document tasks (e.g. reading comprehension, document retrieval).

## Intended uses & limitations

The model can be used by fine-tuning on a downstream task, such as question answering, sequence classification, and token classification. You can also use the raw model for feature extraction (i.e. obtaining embeddings for input text).

### How to use

To use the model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, AutoModel
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state


For fine-tuning, you can use this repository or follow any other BERT fine-tuning codebases.

## Evaluation results

General benchmarks (MRQA and GLUE):

HotpotQA TriviaQA SearchQA NaturalQ NewsQA SQuAD GLUE
F1 F1 F1 F1 F1 F1 Avg score
BERT-base 76.0 70.3 74.2 76.5 65.7 88.7 79.2
LinkBERT-base 78.2 73.9 76.8 78.3 69.3 90.1 79.6
BERT-large 78.1 73.7 78.3 79.0 70.9 91.1 80.7
LinkBERT-large 80.8 78.2 80.5 81.0 72.6 92.7 81.1

## Citation

@InProceedings{yasunaga2022linkbert,
author =  {Michihiro Yasunaga and Jure Leskovec and Percy Liang},