File size: 6,210 Bytes
afe6568
50eafd8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
50eafd8
59898c9
afe6568
50eafd8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93cafd9
50eafd8
52007f1
 
93cafd9
50eafd8
 
 
 
 
 
 
 
 
 
59898c9
 
50eafd8
 
 
 
 
acb6b59
 
50eafd8
 
acb6b59
 
 
 
 
50eafd8
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
language:
- ko  # Example: fr
license: apache-2.0  # Example: apache-2.0 or any license from https://hf.co/docs/hub/repositories-licenses
library_name: transformers  # Optional. Example: keras or any library from https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Libraries.ts
tags:
- text2text-generation  # Example: audio
datasets:
- aihub  # Example: common_voice. Use dataset id from https://hf.co/datasets
metrics:
- bleu  # Example: wer. Use metric id from https://hf.co/metrics
- rouge

# Optional. Add this if you want to encode your eval results in a structured way.
model-index:
- name: ko-TextNumbarT
  results:
  - task:
      type: text2text-generation             # Required. Example: automatic-speech-recognition
      name: text2text-generation             # Optional. Example: Speech Recognition
    metrics:
      - type: bleu         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.958234790096092       # Required. Example: 20.90
        name: eval_bleu         # Optional. Example: Test WER
        verified: false              # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
      - type: rouge1         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.9735361877162854       # Required. Example: 20.90
        name: eval_rouge1         # Optional. Example: Test WER
        verified: false              # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
      - type: rouge2         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.9493975212378124       # Required. Example: 20.90
        name: eval_rouge2       # Optional. Example: Test WER
        verified: false              # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
      - type: rougeL         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.9734558938864928       # Required. Example: 20.90
        name: eval_rougeL        # Optional. Example: Test WER
        verified: false              # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
      - type: rougeLsum         # Required. Example: wer. Use metric id from https://hf.co/metrics
        value: 0.9734350757552404       # Required. Example: 20.90
        name: eval_rougeLsum        # Optional. Example: Test WER
        verified: false              # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
---

# ko-TextNumbarT(TNT Model๐Ÿงจ): Try Korean Reading To Number(ํ•œ๊ธ€์„ ์ˆซ์ž๋กœ ๋ฐ”๊พธ๋Š” ๋ชจ๋ธ)

## Table of Contents
- [ko-TextNumbarT(TNT Model๐Ÿงจ): Try Korean Reading To Number(ํ•œ๊ธ€์„ ์ˆซ์ž๋กœ ๋ฐ”๊พธ๋Š” ๋ชจ๋ธ)](#ko-textnumbarttnt-model-try-korean-reading-to-numberํ•œ๊ธ€์„-์ˆซ์ž๋กœ-๋ฐ”๊พธ๋Š”-๋ชจ๋ธ)
  - [Table of Contents](#table-of-contents)
  - [Model Details](#model-details)
  - [Uses](#uses)
  - [Evaluation](#evaluation)
  - [How to Get Started With the Model](#how-to-get-started-with-the-model)


## Model Details
- **Model Description:**
๋ญ”๊ฐ€ ์ฐพ์•„๋ด๋„ ๋ชจ๋ธ์ด๋‚˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋”ฑํžˆ ์—†์–ด์„œ ๋งŒ๋“ค์–ด๋ณธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. <br />
BartForConditionalGeneration Fine-Tuning Model For Korean To Number <br />
BartForConditionalGeneration์œผ๋กœ ํŒŒ์ธํŠœ๋‹ํ•œ, ํ•œ๊ธ€์„ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Task ์ž…๋‹ˆ๋‹ค. <br />

- Dataset use [Korea aihub](https://aihub.or.kr/aihubdata/data/list.do?currMenu=115&topMenu=100&srchDataRealmCode=REALM002&srchDataTy=DATA004) <br />
I can't open my fine-tuning datasets for my private issue <br />
๋ฐ์ดํ„ฐ์…‹์€ Korea aihub์—์„œ ๋ฐ›์•„์„œ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ํŒŒ์ธํŠœ๋‹์— ์‚ฌ์šฉ๋œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์ •์ƒ ๊ณต๊ฐœํ•ด๋“œ๋ฆด ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค. <br />

- Korea aihub data is ONLY permit to Korean!!!!!!! <br />
aihub์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์œผ์‹ค ๋ถ„์€ ํ•œ๊ตญ์ธ์ผ ๊ฒƒ์ด๋ฏ€๋กœ, ํ•œ๊ธ€๋กœ๋งŒ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. <br />
์ •ํ™•ํžˆ๋Š” ์ฒ ์ž์ „์‚ฌ๋ฅผ ์Œ์„ฑ์ „์‚ฌ๋กœ ๋ฒˆ์—ญํ•˜๋Š” ํ˜•ํƒœ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. (ETRI ์ „์‚ฌ๊ธฐ์ค€) <br />

- In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets <br />
์ฒœ๋งŒ์„ 1000๋งŒ ํ˜น์€ 10000000์œผ๋กœ ์“ธ ์ˆ˜๋„ ์žˆ๊ธฐ์—, Training Datasets์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๋Š” ์ƒ์ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. <br />

- **์ˆ˜๊ด€ํ˜•์‚ฌ์™€ ์ˆ˜ ์˜์กด๋ช…์‚ฌ์˜ ๋„์–ด์“ฐ๊ธฐ์— ๋”ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ€ ํ™•์—ฐํžˆ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (์‰ฐ์‚ด, ์‰ฐ ์‚ด -> ์‰ฐ์‚ด, 50์‚ด)** https://eretz2.tistory.com/34 <br />
์ผ๋‹จ์€ ๊ธฐ์ค€์„ ์žก๊ณ  ์น˜์šฐ์น˜๊ฒŒ ํ•™์Šต์‹œํ‚ค๊ธฐ์—” ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋ ์ง€ ๋ชฐ๋ผ, ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์— ๋งก๊ธฐ๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค. (์‰ฐ ์‚ด์ด ๋” ๋งŽ์„๊นŒ ์‰ฐ์‚ด์ด ๋” ๋งŽ์„๊นŒ!?)
- **Developed by:**  Yoo SungHyun(https://github.com/YooSungHyun)
- **Language(s):** Korean
- **License:** apache-2.0
- **Parent Model:** See the [kobart-base-v2](https://huggingface.co/gogamza/kobart-base-v2) for more information about the pre-trained base model.
  
## Uses
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`

## Evaluation
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/14hyusvf?workspace=user-bart_tadev)

## How to Get Started With the Model
```python
from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋Ÿฌ๊ฒŒ ๋ˆ„๊ฐ€ ์—ฌ์„ฏ์‹œ๊นŒ์ง€ ์ˆ ์„ ๋งˆ์‹œ๋ž˜?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-TextNumbarT")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-TextNumbarT")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
    "min_length": 0,
    "max_length": 1206,
    "num_beams": 100,
    "do_sample": False,
    "num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋Ÿฌ๊ฒŒ ๋ˆ„๊ฐ€ 6์‹œ๊นŒ์ง€ ์ˆ ์„ ๋งˆ์‹œ๋ž˜?
```