File size: 6,289 Bytes
2b48be8
 
 
 
 
26cbf4b
 
9d87686
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f66377d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36985dd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254548d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc842da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2b48be8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba85d08
 
 
2b48be8
 
 
 
 
 
 
 
 
 
 
 
 
8237f42
2b48be8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
---
language: en
license: apache-2.0
datasets:
- scientific_papers
tags:
- summarization
model-index:
- name: google/bigbird-pegasus-large-arxiv
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: scientific_papers
      type: scientific_papers
      config: pubmed
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 36.0276
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 13.4166
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 21.9612
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 29.648
      verified: true
    - name: loss
      type: loss
      value: 2.774355173110962
      verified: true
    - name: meteor
      type: meteor
      value: 0.2824
      verified: true
    - name: gen_len
      type: gen_len
      value: 209.2537
      verified: true
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: cnn_dailymail
      type: cnn_dailymail
      config: 3.0.0
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 9.0885
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 1.0325
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 7.3182
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 8.1455
      verified: true
    - name: loss
      type: loss
      value: .nan
      verified: true
    - name: gen_len
      type: gen_len
      value: 210.4762
      verified: true
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: xsum
      type: xsum
      config: default
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 4.9787
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 0.3527
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 4.3679
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 4.1723
      verified: true
    - name: loss
      type: loss
      value: .nan
      verified: true
    - name: gen_len
      type: gen_len
      value: 230.4886
      verified: true
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: scientific_papers
      type: scientific_papers
      config: arxiv
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 43.4702
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 17.4297
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 26.2587
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 35.5587
      verified: true
    - name: loss
      type: loss
      value: 2.1113228797912598
      verified: true
    - name: gen_len
      type: gen_len
      value: 183.3702
      verified: true
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: samsum
      type: samsum
      config: samsum
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 3.621
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 0.1699
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 3.2016
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 3.3269
      verified: true
    - name: loss
      type: loss
      value: 7.664482116699219
      verified: true
    - name: gen_len
      type: gen_len
      value: 233.8107
      verified: true
---

# BigBirdPegasus model (large)

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. 

BigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).

Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.

## Model description

BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.

## How to use

Here is how to use this model to get the features of a given text in PyTorch:

```python
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
```

## Training Procedure

This checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **arxiv dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers).

## BibTeX entry and citation info

```tex
@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```