File size: 8,698 Bytes
4ca0d83
 
363b171
 
 
4ca0d83
363b171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: mit
language:
- ko
pipeline_tag: fill-mask
---
# KF-DeBERTa
์นด์นด์˜ค๋ฑ…ํฌ & ์—ํ”„์—”๊ฐ€์ด๋“œ์—์„œ ํ•™์Šตํ•œ ๊ธˆ์œต ๋„๋ฉ”์ธ ํŠนํ™” ์–ธ์–ด๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.  

## Model description
* KF-DeBERTa๋Š” ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ง๋ญ‰์น˜์™€ ๊ธˆ์œต ๋„๋ฉ”์ธ ๋ง๋ญ‰์น˜๋ฅผ ํ•จ๊ป˜ ํ•™์Šตํ•œ ์–ธ์–ด๋ชจ๋ธ ์ž…๋‹ˆ๋‹ค.
* ๋ชจ๋ธ ์•„ํ‚คํ…์ณ๋Š” [DeBERTa-v2](https://github.com/microsoft/DeBERTa#whats-new-in-v2)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  * ELECTRA์˜ RTD๋ฅผ training objective๋กœ ์‚ฌ์šฉํ•œ DeBERTa-v3๋Š” ์ผ๋ถ€ task(KLUE-RE, WoS, Retrieval)์—์„œ ์ƒ๋‹นํžˆ ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์—ฌ ์ตœ์ข… ์•„ํ‚คํ…์ณ๋Š” DeBERTa-v2๋กœ ๊ฒฐ์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
* ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ฐ ๊ธˆ์œต ๋„๋ฉ”์ธ downstream task์—์„œ ๋ชจ๋‘ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  * ๊ธˆ์œต ๋„๋ฉ”์ธ downstream task์˜ ์ฒ ์ €ํ•œ ์„ฑ๋Šฅ๊ฒ€์ฆ์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  * ๋ฒ”์šฉ ๋„๋ฉ”์ธ ๋ฐ ๊ธˆ์œต ๋„๋ฉ”์ธ์—์„œ ๊ธฐ์กด ์–ธ์–ด๋ชจ๋ธ๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คฌ์œผ๋ฉฐ ํŠนํžˆ KLUE Benchmark์—์„œ๋Š” RoBERTa-Large๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

## Usage
```python3
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("kakaobank/kf-deberta-base")
tokenizer = AutoTokenizer.from_pretrained("kakaobank/kf-deberta-base")

text = "์นด์นด์˜ค๋ฑ…ํฌ์™€ ์—ํ”„์—”๊ฐ€์ด๋“œ๊ฐ€ ๊ธˆ์œตํŠนํ™” ์–ธ์–ด๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค."
tokens = tokenizer.tokenize(text)
print(tokens)

inputs = tokenizer(text, return_tensors="pt")
model_output = model(**inputs)
print(model_output)
```

## Benchmark
* ๋ชจ๋“  task๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๊ธฐ๋ณธ์ ์ธ hyperparameter search๋งŒ ์ˆ˜ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  * batch size: {16, 32}
  * learning_rate: {1e-5, 3e-5, 5e-5}
  * weight_decay: {0, 0.01}
  * warmup_proportion: {0, 0.1}

**KLUE Benchmark**

|        Model         |       YNAT       |        KLUE-ST         |   KLUE-NLI   |             KLUE-NER              |            KLUE-RE            |        KLUE-DP         |         KLUE-MRC          |          WoS           |       AVG        |
|:--------------------:|:----------------:|:----------------------:|:------------:|:---------------------------------:|:-----------------------------:|:----------------------:|:-------------------------:|:----------------------:|:----------------:|
|                      |        F1        |      Pearsonr/F1       |     ACC      |         F1-Entity/F1-Char         |         F1-micro/AUC          |        UAS/LAS         |         EM/ROUGE          |        JGA/F1-S        |                  | 
|     mBERT (Base)     |      82.64       |      82.97/75.93       |    72.90     |            75.56/88.81            |          58.39/56.41          |      88.53/86.04       |        49.96/55.57        |      35.27/88.60       |      71.26       |
|     XLM-R (Base)     |      84.52       |      88.88/81.20       |    78.23     |            80.48/92.14            |          57.62/57.05          |      93.12/87.23       |        26.76/53.36        |      41.54/89.81       |      72.28       |
|    XLM-R (Large)     |      87.30       |      93.08/87.17       |    86.40     |            82.18/93.20            |          58.75/63.53          |      92.87/87.82       |        35.23/66.55        |      42.44/89.88       |      76.17       |
|    KR-BERT (Base)    |      85.36       |      87.50/77.92       |    77.10     |            74.97/90.46            |          62.83/65.42          |      92.87/87.13       |        48.95/58.38        |      45.60/90.82       |      74.67       |
|   KoELECTRA (Base)   |      85.99       |      93.14/85.89       |    86.87     |            86.06/92.75            |          62.67/57.46          |      90.93/87.07       |        59.54/65.64        |      39.83/88.91       |      77.34       |
|   KLUE-BERT (Base)   |      86.95       |      91.01/83.44       |    79.87     |            83.71/91.17            |          65.58/68.11          |      93.07/87.25       |        62.42/68.15        |      46.72/91.59       |      78.50       |
| KLUE-RoBERTa (Small) |      85.95       |      91.70/85.42       |    81.00     |            83.55/91.20            |          61.26/60.89          |      93.47/87.50       |        58.25/63.56        |      46.65/91.50       |      77.28       |
| KLUE-RoBERTa (Base)  |      86.19       |      92.91/86.78       |    86.30     |            83.81/91.09            |          66.73/68.11          |      93.75/87.77       |        69.56/74.64        |      47.41/91.60       |      80.48       |
| KLUE-RoBERTa (Large) |      85.88       |      93.20/86.13       |  **89.50**   |            84.54/91.45            |        **71.06**/73.33        |      93.84/87.93       |    **75.26**/**80.30**    |      49.39/92.19       |      82.43       |
|  KF-DeBERTa (Base)   | **<u>87.51</u>** | **<u>93.24/87.73</u>** | <u>88.37</u> | **<u>89.17</u>**/**<u>93.30</u>** | <u>69.70</u>/**<u>75.07</u>** | **<u>94.05/87.97</u>** | <u>72.59</u>/<u>78.08</u> | **<u>50.21/92.59</u>** | **<u>82.83</u>** |

* ๊ตต์€๊ธ€์”จ๋Š” ๋ชจ๋“  ๋ชจ๋ธ์ค‘ ๊ฐ€์žฅ๋†’์€ ์ ์ˆ˜์ด๋ฉฐ, ๋ฐ‘์ค„์€ base ๋ชจ๋ธ ์ค‘ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜์ž…๋‹ˆ๋‹ค.

**๊ธˆ์œต๋„๋ฉ”์ธ ๋ฒค์น˜๋งˆํฌ**
|        Model        | FN-Sentiment (v1) | FN-Sentiment (v2) | FN-Adnews |  FN-NER   |  KorFPB   | KorFiQA-SA | KorHeadline | Avg (FiQA-SA ์ œ์™ธ)  |
|:-------------------:|:-----------------:|:-----------------:|:---------:|:---------:|:---------:|:----------:|:-----------:|:-----------------:|
|                     |        ACC        |        ACC        |    ACC    | F1-micro  |    ACC    |    MSE     |   Mean F1   |                   |
| KLUE-RoBERTa (Base) |       98.26       |       91.21       |   96.34   |   90.31   |   90.97   |   0.0589   |    81.11    |       94.03       |
|  KoELECTRA (Base)   |       98.26       |       90.56       |   96.98   |   89.81   |   92.36   |   0.0652   |    80.69    |       93.90       |
|  KF-DeBERTa (Base)  |     **99.36**     |     **92.29**     | **97.63** | **91.80** | **93.47** | **0.0553** |  **82.12**  |     **95.27**     |

* **FN-Sentiment**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ฐ์„ฑ๋ถ„์„
* **FN-Adnews**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ด‘๊ณ ์„ฑ๊ธฐ์‚ฌ ๋ถ„๋ฅ˜
* **FN-NER**: ๊ธˆ์œต๋„๋ฉ”์ธ ๊ฐœ์ฒด๋ช…์ธ์‹
* **KorFPB**: FinancialPhraseBank ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
  * Cite: ```Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.```
* **KorFiQA-SA**: FiQA-SA ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
  * Cite: ```Maia, Macedo & Handschuh, Siegfried & Freitas, Andre & Davis, Brian & McDermott, Ross & Zarrouk, Manel & Balahur, Alexandra. (2018). WWW'18 Open Challenge: Financial Opinion Mining and Question Answering. WWW '18: Companion Proceedings of the The Web Conference 2018. 1941-1942. 10.1145/3184558.3192301.``` 
* **KorHeadline**: Gold Commodity News and Dimensions ๋ฒˆ์—ญ๋ฐ์ดํ„ฐ
  * Cite: ```Sinha, A., & Khandait, T. (2021, April). Impact of News on the Commodity Market: Dataset and Results. In 
    Future of Information and Communication Conference (pp. 589-601). Springer, Cham.```


**๋ฒ”์šฉ๋„๋ฉ”์ธ ๋ฒค์น˜๋งˆํฌ**
|        Model        |   NSMC    |   PAWS    |  KorNLI   |  KorSTS   |     KorQuAD     | Avg (KorQuAD ์ œ์™ธ) |
|:-------------------:|:---------:|:---------:|:---------:|:---------:|:---------------:|:----------------:|
|                     |    ACC    |    ACC    |    ACC    | spearman  |      EM/F1      |                  |
| KLUE-RoBERTa (Base) |   90.47   |   84.79   |   81.65   |   84.40   |   86.34/94.40   |      85.33       |
|  KoELECTRA (Base)   |   90.63   |   84.45   |   82.24   |   85.53   |   84.83/93.45   |      85.71       |
|  KF-DeBERTa (Base)  | **91.36** | **86.14** | **84.54** | **85.99** | **86.60/95.07** |    **87.01**     |



## License
KF-DeBERTa์˜ ์†Œ์Šค์ฝ”๋“œ ๋ฐ ๋ชจ๋ธ์€ MIT ๋ผ์ด์„ ์Šค ํ•˜์— ๊ณต๊ฐœ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.  
๋ผ์ด์„ ์Šค ์ „๋ฌธ์€ [MIT ํŒŒ์ผ](LICENSE)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.  
๋ชจ๋ธ์˜ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•œ ์–ด๋– ํ•œ ์†ํ•ด์— ๋Œ€ํ•ด์„œ๋„ ๋‹น์‚ฌ๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

## Citation
```
@proceedings{jeon-etal-2023-kfdeberta,
  title         = {KF-DeBERTa: Financial Domain-specific Pre-trained Language Model},
  author        = {Eunkwang Jeon, Jungdae Kim, Minsang Song, and Joohyun Ryu},
  booktitle     = {Proceedings of the 35th Annual Conference on Human and Cognitive Language Technology},
  moth          = {oct},
  year          = {2023},
  publisher     = {Korean Institute of Information Scientists and Engineers},
  url           = {http://www.hclt.kr/symp/?lnb=conference},
  pages         = {143--148},
}
```