File size: 6,214 Bytes
9cdbee3 a9f1e24 7d05d99 a9f1e24 70652c1 a9f1e24 e2a78ee a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab a9f1e24 f5442ab e2a78ee e97ab07 e2a78ee e97ab07 e2a78ee 9c20e00 e2a78ee 9c20e00 e2a78ee 4e2c89c e2a78ee f01eae3 4e2c89c e2a78ee 9c20e00 f5442ab e2a78ee 26dbdb8 f5442ab e2a78ee 9a00f89 f5442ab e2a78ee 9a00f89 e2a78ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
language:
- ko # Example: fr
license: apache-2.0 # Example: apache-2.0 or any license from https://hf.co/docs/hub/repositories-licenses
library_name: transformers # Optional. Example: keras or any library from https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Libraries.ts
tags:
- text2text-generation # Example: audio
datasets:
- aihub # Example: common_voice. Use dataset id from https://hf.co/datasets
metrics:
- bleu # Example: wer. Use metric id from https://hf.co/metrics
- rouge
# Optional. Add this if you want to encode your eval results in a structured way.
model-index:
- name: ko-barTNumText
results:
- task:
type: text2text-generation # Required. Example: automatic-speech-recognition
name: text2text-generation # Optional. Example: Speech Recognition
metrics:
- type: bleu # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9313276940897475 # Required. Example: 20.90
name: eval_bleu # Optional. Example: Test WER
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
- type: rouge1 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9607081256861959 # Required. Example: 20.90
name: eval_rouge1 # Optional. Example: Test WER
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
- type: rouge2 # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9394649136169404 # Required. Example: 20.90
name: eval_rouge2 # Optional. Example: Test WER
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
- type: rougeL # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9605735834651536 # Required. Example: 20.90
name: eval_rougeL # Optional. Example: Test WER
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
- type: rougeLsum # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.9605993760190767 # Required. Example: 20.90
name: eval_rougeLsum # Optional. Example: Test WER
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
---
# ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)
## Table of Contents
- [ko-barTNumText(TNT Model๐งจ): Try Number To Korean Reading(์ซ์๋ฅผ ํ๊ธ๋ก ๋ฐ๊พธ๋ ๋ชจ๋ธ)](#ko-bartnumtexttnt-model-try-number-to-korean-reading์ซ์๋ฅผ-ํ๊ธ๋ก-๋ฐ๊พธ๋-๋ชจ๋ธ)
- [Table of Contents](#table-of-contents)
- [Model Details](#model-details)
- [Uses](#uses)
- [Evaluation](#evaluation)
- [How to Get Started With the Model](#how-to-get-started-with-the-model)
## Model Details
- **Model Description:**
๋ญ๊ฐ ์ฐพ์๋ด๋ ๋ชจ๋ธ์ด๋ ์๊ณ ๋ฆฌ์ฆ์ด ๋ฑํ ์์ด์ ๋ง๋ค์ด๋ณธ ๋ชจ๋ธ์
๋๋ค. <br />
BartForConditionalGeneration Fine-Tuning Model For Number To Korean <br />
BartForConditionalGeneration์ผ๋ก ํ์ธํ๋ํ, ์ซ์๋ฅผ ํ๊ธ๋ก ๋ณํํ๋ Task ์
๋๋ค. <br />
- Dataset use [Korea aihub](https://aihub.or.kr/aihubdata/data/list.do?currMenu=115&topMenu=100&srchDataRealmCode=REALM002&srchDataTy=DATA004) <br />
I can't open my fine-tuning datasets for my private issue <br />
๋ฐ์ดํฐ์
์ Korea aihub์์ ๋ฐ์์ ์ฌ์ฉํ์์ผ๋ฉฐ, ํ์ธํ๋์ ์ฌ์ฉ๋ ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ์ฌ์ ์ ๊ณต๊ฐํด๋๋ฆด ์๋ ์์ต๋๋ค. <br />
- Korea aihub data is ONLY permit to Korean!!!!!!! <br />
aihub์์ ๋ฐ์ดํฐ๋ฅผ ๋ฐ์ผ์ค ๋ถ์ ํ๊ตญ์ธ์ผ ๊ฒ์ด๋ฏ๋ก, ํ๊ธ๋ก๋ง ์์ฑํฉ๋๋ค. <br />
์ ํํ๋ ์์ฑ์ ์ฌ๋ฅผ ์ฒ ์์ ์ฌ๋ก ๋ฒ์ญํ๋ ํํ๋ก ํ์ต๋ ๋ชจ๋ธ์
๋๋ค. (ETRI ์ ์ฌ๊ธฐ์ค) <br />
- In case, ten million, some people use 10 million or some people use 10000000, so this model is crucial for training datasets <br />
์ฒ๋ง์ 1000๋ง ํน์ 10000000์ผ๋ก ์ธ ์๋ ์๊ธฐ์, Training Datasets์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๋ ์์ดํ ์ ์์ต๋๋ค. <br />
- **์๊ดํ์ฌ์ ์ ์์กด๋ช
์ฌ์ ๋์ด์ฐ๊ธฐ์ ๋ฐ๋ผ ๊ฒฐ๊ณผ๊ฐ ํ์ฐํ ๋ฌ๋ผ์ง ์ ์์ต๋๋ค. (์ฐ์ด, ์ฐ ์ด -> ์ฐ์ด, 50์ด)** https://eretz2.tistory.com/34 <br />
์ผ๋จ์ ๊ธฐ์ค์ ์ก๊ณ ์น์ฐ์น๊ฒ ํ์ต์ํค๊ธฐ์ ์ด๋ป๊ฒ ์ฌ์ฉ๋ ์ง ๋ชฐ๋ผ, ํ์ต ๋ฐ์ดํฐ ๋ถํฌ์ ๋งก๊ธฐ๋๋ก ํ์ต๋๋ค. (์ฐ ์ด์ด ๋ ๋ง์๊น ์ฐ์ด์ด ๋ ๋ง์๊น!?)
- **Developed by:** Yoo SungHyun(https://github.com/YooSungHyun)
- **Language(s):** Korean
- **License:** apache-2.0
- **Parent Model:** See the [kobart-base-v2](https://huggingface.co/gogamza/kobart-base-v2) for more information about the pre-trained base model.
## Uses
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`
## Evaluation
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/326xgytt?workspace=user-bart_tadev)
## How to Get Started With the Model
```python
from transformers.pipelines import Text2TextGenerationPipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
texts = ["๊ทธ๋ฌ๊ฒ ๋๊ฐ 6์๊น์ง ์ ์ ๋ง์๋?"]
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
kwargs = {
"min_length": 0,
"max_length": 1206,
"num_beams": 100,
"do_sample": False,
"num_beam_groups": 1,
}
pred = seq2seqlm_pipeline(texts, **kwargs)
print(pred)
# ๊ทธ๋ฌ๊ฒ ๋๊ฐ ์ฌ์ฏ ์๊น์ง ์ ์ ๋ง์๋?
```
|