File size: 5,104 Bytes
32ba418
 
 
c16b237
 
08d00a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32ba418
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
933f58b
 
 
32ba418
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
language: en
license: apache-2.0
tags:
- summarization
datasets:
- big_patent
model-index:
- name: google/bigbird-pegasus-large-bigpatent
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: xsum
      type: xsum
      config: default
      split: test
    metrics:
    - type: rouge
      value: 8.6864
      name: ROUGE-1
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWJiZmRhNjJiZjBjMjE0OTY5YTViNDE2MzU3OTQxODViOGY2ODIwNzMxNTNiMzVmMmEyN2FmOWEyNTRlOTVmZCIsInZlcnNpb24iOjF9.vsq1y4S_o8i6B2i61cmBVGKqDtqdRDpl-tdC_cqTnTgUIDov-DEO4cu1mrPI-KHGoTkRpO9wwDBMenVVp44LDg
    - type: rouge
      value: 0.7795
      name: ROUGE-2
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjFkMmQ5ZWVlM2FjZGZhOTY4OWE4ZmJkNDU3ZWNhZDQyMjFkZWY2ZGVlMGQ2M2I3MzFhYjY3NjNhNzgzYmE5MyIsInZlcnNpb24iOjF9.WpU_IQbf1dxiIhh3ZYfQg89rigJEkXoLIFwb236JdpCQokwHFEGfh56RHLqb8OijMJlHRsh0zfOsxNB4jINDAA
    - type: rouge
      value: 7.1464
      name: ROUGE-L
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWYzYjIyMWU3NDMzZTBmZDdiOGFmZjBiOGI0YjUyNzY5NDIwYjIzMTFiY2RjNTA0MDkxOWI5NDg3OTk5MDMwZCIsInZlcnNpb24iOjF9.IwPFBf-0bQnlHxynOUsrDLB6BeBg1BbnC1ey5PaqaODls_ibsN0SopyDH4gQ7cyZu2srqleQbTiZGla1EjmkAw
    - type: rouge
      value: 7.0344
      name: ROUGE-LSUM
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjVjNTE5YzgxNmU2N2VkY2MzZmNlNzYxNDQxNjZmNTdjNDc3NzRmY2FjYzc5MjA1MTc0MTAzNTA1YTEwNjAwZSIsInZlcnNpb24iOjF9.IhPL0Ywph8EhcUyqBlr26KBbkaO7a8MHXtigWCBijglCc6jkPKxg_JpDQccHC-F3r2oUgs7CnPAJtehP05dBBw
    - type: loss
      value: 6.242544651031494
      name: loss
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWY4Y2ZlNTY5YmQ5N2NjMmM5ZmM2ZDFiMTY5YzViM2M2NjVhNjJmNWY0NmY1YWE1NmZiOWQ5OGExM2MwNTYxNCIsInZlcnNpb24iOjF9.rwHs57Mo7aXvbkZEjroclYdyzPgJU9mDEMhnSzWAUMyWtmiejgsn0gZ4KGXapKQT0xVZ7dR5BgR8zxj7DZFUDA
    - type: gen_len
      value: 96.8579
      name: gen_len
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjNlZjEwZjJhMWI3MDVmOGIwNjY4YTMyN2FjY2IzOGRjMDg3OGRkYmE1ZWY2ZGMwODI2YjBiNDIwZmQ0YjEyMSIsInZlcnNpb24iOjF9.TeWWnDzmNSiaK7HJcWImlBytQ9iWykYyD0X7mnXum6DcTHwuCr5bYrvcxTGIRl1BXqy9_4Jwhozl5Hq-v2neAA
---

# BigBirdPegasus model (large)

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. 

BigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).

Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.

## Model description

BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.

## How to use

Here is how to use this model to get the features of a given text in PyTorch:

```python
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
```

## Training Procedure

This checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on [big_patent](https://huggingface.co/datasets/big_patent) dataset.

## BibTeX entry and citation info

```tex
@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```