BART¶
DISCLAIMER: If you see something strange, file a Github Issue and assign @patrickvonplaten
Overview¶
The Bart model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
The Authors’ code can be found here.
Examples¶
Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in examples/seq2seq/.
An example of how to train
BartForConditionalGeneration
with a Hugging Facedatasets
object can be found in this forum discussion.Distilled checkpoints are described in this paper.
Implementation Notes¶
Bart doesn’t use
token_type_ids
for sequence classification. UseBartTokenizer
orencode()
to get the proper splitting.The forward pass of
BartModel
will create thedecoder_input_ids
if they are not passed. This is different than some other modeling APIs. A typical use case of this feature is mask filling.Model predictions are intended to be identical to the original implementation when
force_bos_token_to_be_generated=True
. This only works, however, if the string you pass tofairseq.encode()
starts with a space.generate()
should be used for conditional generation tasks like summarization, see the example in that docstrings.Models that load the facebook/bart-large-cnn weights will not have a
mask_token_id
, or be able to perform mask-filling tasks.
Mask Filling¶
The facebook/bart-base
and facebook/bart-large
checkpoints can be used to fill multi-token masks.
from transformers import BartForConditionalGeneration, BartTokenizer
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", force_bos_token_to_be_generated=True)
tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
batch = tok(example_english_phrase, return_tensors='pt')
generated_ids = model.generate(batch['input_ids'])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == ['UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria']