File size: 1,584 Bytes
adb10ae
 
ac105e2
 
 
 
 
adb10ae
 
 
 
ceacda4
 
adb10ae
 
 
 
 
 
3767791
adb10ae
 
 
 
 
ef8794c
adb10ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
language: ko
tags:
  - korean
mask_token: "[MASK]"
widget:
  - text: 대한민국의 수도는 [MASK] 입니다.
---

# KoBigBird

<img src="https://user-images.githubusercontent.com/28896432/140442206-e34b02d5-e279-47e5-9c2a-db1278b1c14d.png" width="200"/>

Pretrained BigBird Model for Korean (**kobigbird-bert-base**)

## About

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences.

BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT.

Model is warm started from Korean BERT’s checkpoint.

## How to use

*NOTE:* Use `BertTokenizer` instead of BigBirdTokenizer. (`AutoTokenizer` will load `BertTokenizer`)

```python
from transformers import AutoModel, AutoTokenizer

# by default its in `block_sparse` mode with num_random_blocks=3, block_size=64
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base")

# you can change `attention_type` to full attention like this:
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base", block_size=16, num_random_blocks=2)

tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base")
text = "한국어 BigBird 모델을 공개합니다!"
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
```