BM-K commited on
Commit
99e4cf2
โ€ข
1 Parent(s): bb0df50

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ ---
5
+ # NewsKoT5
6
+ The training data for this T5 model consists of Korean news articles (29GB). However, the performance has not been fine-tuned through the use of small batches and a limited number of training steps, so it may not be fully optimized.
7
+
8
+ ## Quick tour
9
+ ```python
10
+ from transformers import AutoTokenizer, T5ForConditionalGeneration
11
+
12
+ tokenizer = AutoTokenizer.from_pretrained("BM-K/NewsKoT5-small")
13
+ model = T5ForConditionalGeneration.from_pretrained("BM-K/NewsKoT5-small")
14
+
15
+ input_ids = tokenizer("ํ•œ๊ตญํ˜•๋ฐœ์‚ฌ์ฒด ๋ˆ„๋ฆฌํ˜ธ๊ฐ€ ์‹ค์šฉ๊ธ‰ <extra_id_0> ๋ฐœ์‚ฌ์ฒด๋กœ์„œ โ€˜๋ฐ๋ท”โ€™๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ <extra_id_1>", return_tensors="pt").input_ids
16
+ labels = tokenizer("<extra_id_0> ์œ„์„ฑ <extra_id_1> ๋งˆ์ณค๋‹ค <extra_id_2>", return_tensors="pt").input_ids
17
+
18
+ outputs = model(input_ids=input_ids,
19
+ labels=labels)
20
+ ```
21
+
22
+ ## News Summarization Performance (F1-score)
23
+ After restoring the model's tokenized output to the original text, Rouge performance was evaluated by comparing it to the reference and hypothesis tokenized using [mecab](https://konlpy.org/ko/v0.4.0/).
24
+
25
+ - Dacon ํ•œ๊ตญ์–ด ๋ฌธ์„œ ์ƒ์„ฑ์š”์•ฝ AI ๊ฒฝ์ง„๋Œ€ํšŒ [Dataset](https://dacon.io/competitions/official/235673/overview/description)
26
+ - Training: 29,432
27
+ - Validation: 7,358
28
+ - Test: 9,182
29
+
30
+ | | #Param | rouge-1 |rouge-2|rouge-l|
31
+ |-------|--------:|--------:|--------:|--------:|
32
+ | pko-t5-small | 95M | 51.48 | 33.18 | 44.96 |
33
+ | NewsT5-small | 61M | 52.15 | 33.59 | 45.41 |
34
+
35
+ - AI-Hub ๋ฌธ์„œ์š”์•ฝ ํ…์ŠคํŠธ [Dataset](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=97)
36
+ - Training: 245,626
37
+ - Validation: 20,296
38
+ - Test: 9,931
39
+
40
+ | | #Param | rouge-1 |rouge-2|rouge-l|
41
+ |-------|--------:|--------:|--------:|--------:|
42
+ | pko-t5-small | 95M | 53.44 | 34.03 | 45.36 |
43
+ | NewsT5-small | 61M | 53.74 | 34.27 | 45.52 |
44
+
45
+ - [pko-t5-small](https://github.com/paust-team/pko-t5)