KPMGhyesukim commited on
Commit
b68aff5
โ€ข
1 Parent(s): 96e9516

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -17
README.md CHANGED
@@ -5,51 +5,40 @@ license: mit
5
  ---
6
 
7
  ## What is DeBERTa?
8
-
9
  - [DeBERTa](https://arxiv.org/abs/2006.03654)๋Š” `Disentangled Attention` + `Enhanced Mask Decoder` ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ positional information์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด, ๊ธฐ์กด์˜ BERT, RoBERTa์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ absolute position embedding๊ณผ๋Š” ๋‹ฌ๋ฆฌ DeBERTa๋Š” ๋‹จ์–ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, BERT, RoBERTA ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
10
  - [DeBERTa-v3](https://arxiv.org/abs/2111.09543)์—์„œ๋Š”, ์ด์ „ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ MLM (Masked Language Model) ์„ RTD (Replaced Token Detection) Task ๋กœ ๋Œ€์ฒดํ•œ ELECTRA ์Šคํƒ€์ผ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ, Gradient-Disentangled Embedding Sharing ์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์˜ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
11
  - DeBERTa์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ, `mDeBERTa-v3-base-kor-further` ๋Š” microsoft ๊ฐ€ ๋ฐœํ‘œํ•œ `mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ **์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต**์„ ์ง„ํ–‰ํ•œ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
12
-
13
  ## How to Use
14
-
15
  - Requirements
16
-
17
  ```
18
  pip install transformers
19
  pip install sentencepiece
20
- ```
21
-
22
  - Huggingface Hub
23
-
24
  ```python
25
  from transformers import AutoModel, AutoTokenizer
26
 
27
  model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2ForModel
28
  tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2Tokenizer (SentencePiece)
29
  ```
30
-
31
 
32
  ## Pre-trained Models
33
-
34
  - ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ธฐ์กด microsoft์—์„œ ๋ฐœํ‘œํ•œ `mdeberta-v3-base`์™€ ๋™์ผํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
35
 
36
-
37
  | | Vocabulary(K) | Backbone Parameters(M) | Hidden Size | Layers | Note |
38
  | --- | --- | --- | --- | --- | --- |
39
  | mdeberta-v3-base-kor-further (mdeberta-v3-base์™€ ๋™์ผ) | 250 | 86 | 768 | 12 | 250K new SPM vocab |
40
 
41
  ## Further Pretraing Details (MLM Task)
42
-
43
  - `KPMG-mDeBERTa-v3-base-kor-further` ๋Š” `microsoft/mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ MLM Task๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
44
 
45
-
46
  | | Max length | Learning Rate | Batch Size | Train Steps | Warm-up Steps |
47
  | --- | --- | --- | --- | --- | --- |
48
  | mdeberta-v3-base-kor-further | 512 | 2e-5 | 8 | 5M | 50k |
49
 
50
 
51
  ## Datasets
52
-
53
  - ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(์‹ ๋ฌธ, ๊ตฌ์–ด, ๋ฌธ์–ด), ํ•œ๊ตญ์–ด Wiki, ๊ตญ๋ฏผ์ฒญ์› ๋“ฑ ์•ฝ 40 GB ์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
54
  - Train: 10M lines, 5B tokens
55
  - Valid: 2M lines, 1B tokens
@@ -57,15 +46,13 @@ license: mit
57
 
58
 
59
  ## Fine-tuning on NLU Tasks - Base Model
60
-
61
  | Model | Size | NSMC(acc) | Naver NER(F1) | PAWS (acc) | KorNLI (acc) | KorSTS (spearman) | Question Pair (acc) | KorQuaD (Dev) (EM/F1) | Korean-Hate-Speech (Dev) (F1) |
62
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
63
  | XLM-Roberta-Base | 1.03G | 89.03 | 86.65 | 82.80 | 80.23 | 78.45 | 93.80 | 64.70 / 88.94 | 64.06 |
64
  | mdeberta-base | 534M | 90.01 | 87.43 | 85.55 | 80.41 | **82.65** | 94.06 | 65.48 / 89.74 | 62.91 |
65
  | mdeberta-base-kor-further (Ours) | 534M | **90.52** | **87.87** | **85.85** | **80.65** | 81.90 | **94.98** | **66.07 / 90.35** | **68.16** |
66
 
67
- ### Citation
68
-
69
  ```
70
  @misc{he2021debertav3,
71
  title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing},
@@ -89,7 +76,6 @@ url={https://openreview.net/forum?id=XPZIaotutsD}
89
  ```
90
 
91
  ## Reference
92
-
93
  - [DeBERTa](https://github.com/microsoft/DeBERTa)
94
  - [Huggingface Transformers](https://github.com/huggingface/transformers)
95
  - [๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜](https://corpus.korean.go.kr/)
 
5
  ---
6
 
7
  ## What is DeBERTa?
 
8
  - [DeBERTa](https://arxiv.org/abs/2006.03654)๋Š” `Disentangled Attention` + `Enhanced Mask Decoder` ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ positional information์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด, ๊ธฐ์กด์˜ BERT, RoBERTa์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ absolute position embedding๊ณผ๋Š” ๋‹ฌ๋ฆฌ DeBERTa๋Š” ๋‹จ์–ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, BERT, RoBERTA ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
9
  - [DeBERTa-v3](https://arxiv.org/abs/2111.09543)์—์„œ๋Š”, ์ด์ „ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ MLM (Masked Language Model) ์„ RTD (Replaced Token Detection) Task ๋กœ ๋Œ€์ฒดํ•œ ELECTRA ์Šคํƒ€์ผ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ, Gradient-Disentangled Embedding Sharing ์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์˜ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
10
  - DeBERTa์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ, `mDeBERTa-v3-base-kor-further` ๋Š” microsoft ๊ฐ€ ๋ฐœํ‘œํ•œ `mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ **์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต**์„ ์ง„ํ–‰ํ•œ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
11
+
12
  ## How to Use
 
13
  - Requirements
 
14
  ```
15
  pip install transformers
16
  pip install sentencepiece
17
+ ```
 
18
  - Huggingface Hub
 
19
  ```python
20
  from transformers import AutoModel, AutoTokenizer
21
 
22
  model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2ForModel
23
  tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2Tokenizer (SentencePiece)
24
  ```
 
25
 
26
  ## Pre-trained Models
 
27
  - ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ธฐ์กด microsoft์—์„œ ๋ฐœํ‘œํ•œ `mdeberta-v3-base`์™€ ๋™์ผํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
28
 
 
29
  | | Vocabulary(K) | Backbone Parameters(M) | Hidden Size | Layers | Note |
30
  | --- | --- | --- | --- | --- | --- |
31
  | mdeberta-v3-base-kor-further (mdeberta-v3-base์™€ ๋™์ผ) | 250 | 86 | 768 | 12 | 250K new SPM vocab |
32
 
33
  ## Further Pretraing Details (MLM Task)
 
34
  - `KPMG-mDeBERTa-v3-base-kor-further` ๋Š” `microsoft/mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ MLM Task๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
35
 
 
36
  | | Max length | Learning Rate | Batch Size | Train Steps | Warm-up Steps |
37
  | --- | --- | --- | --- | --- | --- |
38
  | mdeberta-v3-base-kor-further | 512 | 2e-5 | 8 | 5M | 50k |
39
 
40
 
41
  ## Datasets
 
42
  - ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(์‹ ๋ฌธ, ๊ตฌ์–ด, ๋ฌธ์–ด), ํ•œ๊ตญ์–ด Wiki, ๊ตญ๋ฏผ์ฒญ์› ๋“ฑ ์•ฝ 40 GB ์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
43
  - Train: 10M lines, 5B tokens
44
  - Valid: 2M lines, 1B tokens
 
46
 
47
 
48
  ## Fine-tuning on NLU Tasks - Base Model
 
49
  | Model | Size | NSMC(acc) | Naver NER(F1) | PAWS (acc) | KorNLI (acc) | KorSTS (spearman) | Question Pair (acc) | KorQuaD (Dev) (EM/F1) | Korean-Hate-Speech (Dev) (F1) |
50
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
51
  | XLM-Roberta-Base | 1.03G | 89.03 | 86.65 | 82.80 | 80.23 | 78.45 | 93.80 | 64.70 / 88.94 | 64.06 |
52
  | mdeberta-base | 534M | 90.01 | 87.43 | 85.55 | 80.41 | **82.65** | 94.06 | 65.48 / 89.74 | 62.91 |
53
  | mdeberta-base-kor-further (Ours) | 534M | **90.52** | **87.87** | **85.85** | **80.65** | 81.90 | **94.98** | **66.07 / 90.35** | **68.16** |
54
 
55
+ ## Citation
 
56
  ```
57
  @misc{he2021debertav3,
58
  title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing},
 
76
  ```
77
 
78
  ## Reference
 
79
  - [DeBERTa](https://github.com/microsoft/DeBERTa)
80
  - [Huggingface Transformers](https://github.com/huggingface/transformers)
81
  - [๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜](https://corpus.korean.go.kr/)