KPMGhyesukim commited on
Commit
a8e8473
โ€ข
1 Parent(s): e63579e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md CHANGED
@@ -1,3 +1,99 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # mDeBERTa-v3-base-kor-further
5
+
6
+ ---
7
+
8
+ ## What is DeBERTa?
9
+
10
+ - [DeBERTa](https://arxiv.org/abs/2006.03654)๋Š” `Disentangled Attention` + `Enhanced Mask Decoder` ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ positional information์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ์•„์ด๋””์–ด๋ฅผ ํ†ตํ•ด, ๊ธฐ์กด์˜ BERT, RoBERTa์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ absolute position embedding๊ณผ๋Š” ๋‹ฌ๋ฆฌ DeBERTa๋Š” ๋‹จ์–ด์˜ ์ƒ๋Œ€์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, BERT, RoBERTA ์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋” ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
11
+ - [DeBERTa-v3](https://arxiv.org/abs/2111.09543)์—์„œ๋Š”, ์ด์ „ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ MLM (Masked Language Model) ์„ RTD (Replaced Token Detection) Task ๋กœ ๋Œ€์ฒดํ•œ ELECTRA ์Šคํƒ€์ผ์˜ ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•๊ณผ, Gradient-Disentangled Embedding Sharing ์„ ์ ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์˜ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
12
+ - DeBERTa์˜ ์•„ํ‚คํ…์ฒ˜๋กœ ํ’๋ถ€ํ•œ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ, `mDeBERTa-v3-base-kor-further` ๋Š” microsoft ๊ฐ€ ๋ฐœํ‘œํ•œ `mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ **์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต**์„ ์ง„ํ–‰ํ•œ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
13
+
14
+ ## How to Use
15
+
16
+ - Requirements
17
+
18
+ ```
19
+ pip install transformers
20
+ pip install sentencepiece
21
+ ```
22
+
23
+ - Huggingface Hub
24
+
25
+ ```python
26
+ from transformers import AutoModel, AutoTokenizer
27
+
28
+ model = AutoModel.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2ForModel
29
+ tokenizer = AutoTokenizer.from_pretrained("lighthouse/mdeberta-v3-base-kor-further") # DebertaV2Tokenizer (SentencePiece)
30
+ ```
31
+
32
+
33
+ ## Pre-trained Models
34
+
35
+ - ๋ชจ๋ธ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” ๊ธฐ์กด microsoft์—์„œ ๋ฐœํ‘œํ•œ `mdeberta-v3-base`์™€ ๋™์ผํ•œ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
36
+
37
+
38
+ | | Vocabulary(K) | Backbone Parameters(M) | Hidden Size | Layers | Note |
39
+ | --- | --- | --- | --- | --- | --- |
40
+ | mdeberta-v3-base-kor-further
41
+ (mdeberta-v3-base์™€ ๋™์ผ) | 250 | 86 | 768 | 12 | 250K new SPM vocab |
42
+
43
+ ## Further Pretraing Details (MLM Task)
44
+
45
+ - `KPMG-mDeBERTa-v3-base-kor-further` ๋Š” `microsoft/mDeBERTa-v3-base` ๋ฅผ ์•ฝ 40GB์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ MLM Task๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
46
+
47
+
48
+ | | Max length | Learning Rate | Batch Size | Train Steps | Warm-up Steps |
49
+ | --- | --- | --- | --- | --- | --- |
50
+ | mdeberta-v3-base-kor-further | 512 | 2e-5 | 8 | 5M | 50k |
51
+
52
+
53
+ ## Datasets
54
+
55
+ - ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(์‹ ๋ฌธ, ๊ตฌ์–ด, ๋ฌธ์–ด), ํ•œ๊ตญ์–ด Wiki, ๊ตญ๋ฏผ์ฒญ์› ๋“ฑ ์•ฝ 40 GB ์˜ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”๊ฐ€์ ์ธ ์‚ฌ์ „ํ•™์Šต์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
56
+ - Train: 10M lines, 5B tokens
57
+ - Valid: 2M lines, 1B tokens
58
+ - cf) ๊ธฐ์กด mDeBERTa-v3์€ XLM-R ๊ณผ ๊ฐ™์ด [cc-100 ๋ฐ์ดํ„ฐ์…‹](https://data.statmt.org/cc-100/)์œผ๋กœ ํ•™์Šต๋˜์—ˆ์œผ๋ฉฐ, ๊ทธ ์ค‘ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ๋Š” 54GB์ž…๋‹ˆ๋‹ค.
59
+
60
+
61
+ ## Fine-tuning on NLU Tasks - Base Model
62
+
63
+ | Model | Size | NSMC(acc) | Naver NER(F1) | PAWS (acc) | KorNLI (acc) | KorSTS (spearman) | Question Pair (acc) | KorQuaD (Dev) (EM/F1) | Korean-Hate-Speech (Dev) (F1) |
64
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
+ | XLM-Roberta-Base | 1.03G | 89.03 | 86.65 | 82.80 | 80.23 | 78.45 | 93.80 | 64.70 / 88.94 | 64.06 |
66
+ | mdeberta-base | 534M | 90.01 | 87.43 | 85.55 | 80.41 | 82.65 | 94.06 | 65.48 / 89.74 | 62.91 |
67
+ | mdeberta-base-kor-further | 534M | 90.52 | 87.87 | 85.85 | 80.65 | 81.90 | 94.98 | 66.07 / 90.35 | 68.16 |
68
+
69
+ ### Citation
70
+
71
+ ```
72
+ @misc{he2021debertav3,
73
+ title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing},
74
+ author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
75
+ year={2021},
76
+ eprint={2111.09543},
77
+ archivePrefix={arXiv},
78
+ primaryClass={cs.CL}
79
+ }
80
+ ```
81
+
82
+ ```
83
+ @inproceedings{
84
+ he2021deberta,
85
+ title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
86
+ author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
87
+ booktitle={International Conference on Learning Representations},
88
+ year={2021},
89
+ url={https://openreview.net/forum?id=XPZIaotutsD}
90
+ }
91
+ ```
92
+
93
+ ## Reference
94
+
95
+ - [DeBERTa](https://github.com/microsoft/DeBERTa)
96
+ - [Huggingface Transformers](https://github.com/huggingface/transformers)
97
+ - [๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜](https://corpus.korean.go.kr/)
98
+ - [Korpora: Korean Corpora Archives](https://github.com/ko-nlp/Korpora)
99
+ - [sooftware/Korean PLM](https://github.com/sooftware/Korean-PLM)