vasudevgupta commited on
Commit
935b259
1 Parent(s): 284e312

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -63
README.md CHANGED
@@ -1,63 +1 @@
1
- ---
2
- language: en
3
- license: apache-2.0
4
- datasets:
5
- - bookcorpus
6
- - wikipedia
7
- - cc_news
8
- ---
9
- # BigBird base model
10
-
11
- BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
12
-
13
- It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).
14
-
15
- Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.
16
-
17
- ## Model description
18
-
19
- BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
20
-
21
- ## How to use
22
-
23
- Here is how to use this model to get the features of a given text in PyTorch:
24
-
25
- ```python
26
- from transformers import BigBirdModel
27
-
28
- # by default its in `block_sparse` mode with num_random_blocks=3, block_size=64
29
- model = BigBirdModel.from_pretrained("google/bigbird-roberta-base")
30
-
31
- # you can change `attention_type` to full attention like this:
32
- model = BigBirdModel.from_pretrained("google/bigbird-roberta-base", attention_type="original_full")
33
-
34
- # you can change `block_size` & `num_random_blocks` like this:
35
- model = BigBirdModel.from_pretrained("google/bigbird-roberta-base", block_size=16, num_random_blocks=2)
36
-
37
- text = "Replace me by any text you'd like."
38
- encoded_input = tokenizer(text, return_tensors='pt')
39
- output = model(**encoded_input)
40
- ```
41
-
42
- ## Training Data
43
-
44
- This model is pre-trained on four publicly available datasets: **Books**, **CC-News**, **Stories** and **Wikipedia**. It used same sentencepiece vocabulary as RoBERTa (which is in turn borrowed from GPT2).
45
-
46
- ## Training Procedure
47
-
48
- Document longer than 4096 were split into multiple documents and documents that were much smaller than 4096 were joined. Following the original BERT training, 15% of tokens were masked and model is trained to predict the mask.
49
-
50
- Model is warm started from RoBERTa’s checkpoint.
51
-
52
- ## BibTeX entry and citation info
53
-
54
- ```tex
55
- @misc{zaheer2021big,
56
- title={Big Bird: Transformers for Longer Sequences},
57
- author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
58
- year={2021},
59
- eprint={2007.14062},
60
- archivePrefix={arXiv},
61
- primaryClass={cs.LG}
62
- }
63
- ```
 
1
+ Moved here: https://huggingface.co/google/bigbird-roberta-base