monologg commited on
Commit
3767791
1 Parent(s): 68b4d7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -10,7 +10,7 @@ Pretrained BigBird Model for Korean (**kobigbird-bert-base**)
10
 
11
  BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences.
12
 
13
- BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
14
 
15
  Model is warm started from Korean BERT’s checkpoint.
16
 
10
 
11
  BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences.
12
 
13
+ BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT.
14
 
15
  Model is warm started from Korean BERT’s checkpoint.
16