BART (large-sized model)

Model description

BART is a transformer encoder-decoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering).

Weights shared here are effectively from facebook/bart-large but with added noise for BOS embedding to assist the finetuning.

Intended uses & limitations

There have been quite a few issues related to finetuning BART for text generation, and this repo implements solution discussed in #15559. Particularly adding some noise to pre-trained model's BOS embedding. This seems to solve the problem of endless BOS generation for a finetuned BART model.

You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model in PyTorch:

from transformers import BartTokenizer, BartModel

tokenizer = BartTokenizer.from_pretrained('vedu/bart-large-perturbed')
model = BartModel.from_pretrained('vedu/bart-large-perturbed')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-1910-13461,
  author    = {Mike Lewis and
               Yinhan Liu and
               Naman Goyal and
               Marjan Ghazvininejad and
               Abdelrahman Mohamed and
               Omer Levy and
               Veselin Stoyanov and
               Luke Zettlemoyer},
  title     = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language
               Generation, Translation, and Comprehension},
  journal   = {CoRR},
  volume    = {abs/1910.13461},
  year      = {2019},
  url       = {http://arxiv.org/abs/1910.13461},
  eprinttype = {arXiv},
  eprint    = {1910.13461},
  timestamp = {Thu, 31 Oct 2019 14:02:26 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}