File size: 3,154 Bytes
4b95cf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language: ko
license: mit
tags:
- summarization
- bart
---
# kobart-news
- This model is a [kobart](https://huggingface.co/hyunwoongko/kobart) fine-tuned on the [๋ฌธ์„œ์š”์•ฝ ํ…์ŠคํŠธ/์‹ ๋ฌธ๊ธฐ์‚ฌ](https://aihub.or.kr/aidata/8054) using [Ainize Teachable-NLP](https://ainize.ai/teachable-nlp).

## Usage
### Python Code
```python
from transformers import PreTrainedTokenizerFast, BartForConditionalGeneration
#  Load Model and Tokenize
tokenizer = PreTrainedTokenizerFast.from_pretrained("ainize/kobart-news")
model = BartForConditionalGeneration.from_pretrained("ainize/kobart-news")
# Encode Input Text
input_text = '๊ตญ๋‚ด ์ „๋ฐ˜์ ์ธ ๊ฒฝ๊ธฐ์นจ์ฒด๋กœ ์ƒ๊ฐ€ ๊ฑด๋ฌผ์ฃผ์˜ ์ˆ˜์ต๋„ ์ „๊ตญ์ ์ธ ๊ฐ์†Œ์„ธ๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ์—ฐ๊ตฌ๊ฐœ๋ฐœ๊ธฐ์—… ์ƒ๊ฐ€์ •๋ณด์—ฐ๊ตฌ์†Œ๋Š” ํ•œ๊ตญ๊ฐ์ •์› ํ†ต๊ณ„๋ฅผ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ ์ „๊ตญ ์ค‘๋Œ€ํ˜• ์ƒ๊ฐ€ ์ˆœ์˜์—…์†Œ๋“(๋ถ€๋™์‚ฐ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ž„๋Œ€์ˆ˜์ž…, ๊ธฐํƒ€์ˆ˜์ž…์—์„œ ์ œ๋ฐ˜ ๊ฒฝ๋น„๋ฅผ ๊ณต์ œํ•œ ์ˆœ์†Œ๋“)์ด 1๋ถ„๊ธฐ ใŽก๋‹น 3๋งŒ4200์›์—์„œ 3๋ถ„๊ธฐ 2๋งŒ5800์›์œผ๋กœ ๊ฐ์†Œํ–ˆ๋‹ค๊ณ  17์ผ ๋ฐํ˜”๋‹ค. ์ˆ˜๋„๊ถŒ, ์„ธ์ข…์‹œ, ์ง€๋ฐฉ๊ด‘์—ญ์‹œ์—์„œ ์ˆœ์˜์—…์†Œ๋“์ด ๊ฐ€์žฅ ๋งŽ์ด ๊ฐ์†Œํ•œ ์ง€์—ญ์€ 3๋ถ„๊ธฐ 1๋งŒ3100์›์„ ๊ธฐ๋กํ•œ ์šธ์‚ฐ์œผ๋กœ, 1๋ถ„๊ธฐ 1๋งŒ9100์› ๋Œ€๋น„ 31.4% ๊ฐ์†Œํ–ˆ๋‹ค. ์ด์–ด ๋Œ€๊ตฌ(-27.7%), ์„œ์šธ(-26.9%), ๊ด‘์ฃผ(-24.9%), ๋ถ€์‚ฐ(-23.5%), ์„ธ์ข…(-23.4%), ๋Œ€์ „(-21%), ๊ฒฝ๊ธฐ(-19.2%), ์ธ์ฒœ(-18.5%) ์ˆœ์œผ๋กœ ๊ฐ์†Œํ–ˆ๋‹ค. ์ง€๋ฐฉ ๋„์‹œ์˜ ๊ฒฝ์šฐ๋„ ๋น„์Šทํ–ˆ๋‹ค. ๊ฒฝ๋‚จ์˜ 3๋ถ„๊ธฐ ์ˆœ์˜์—…์†Œ๋“์€ 1๋งŒ2800์›์œผ๋กœ 1๋ถ„๊ธฐ 1๋งŒ7400์› ๋Œ€๋น„ 26.4% ๊ฐ์†Œํ–ˆ์œผ๋ฉฐ ์ œ์ฃผ(-25.1%), ๊ฒฝ๋ถ(-24.1%), ์ถฉ๋‚จ(-20.9%), ๊ฐ•์›(-20.9%), ์ „๋‚จ(-20.1%), ์ „๋ถ(-17%), ์ถฉ๋ถ(-15.3%) ๋“ฑ๋„ ๊ฐ์†Œ์„ธ๋ฅผ ๋ณด์˜€๋‹ค. ์กฐํ˜„ํƒ ์ƒ๊ฐ€์ •๋ณด์—ฐ๊ตฌ์†Œ ์—ฐ๊ตฌ์›์€ "์˜ฌํ•ด ๋‚ด์ˆ˜ ๊ฒฝ๊ธฐ์˜ ์นจ์ฒด๋œ ๋ถ„์œ„๊ธฐ๊ฐ€ ์œ ์ง€๋˜๋ฉฐ ์ƒ๊ฐ€, ์˜คํ”ผ์Šค ๋“ฑ์„ ๋น„๋กฏํ•œ ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ์‹œ์žฅ์˜ ๋ถ„์œ„๊ธฐ๋„ ๊ฒฝ์ง๋œ ๋ชจ์Šต์„ ๋ณด์˜€๊ณ  ์˜คํ”ผ์Šคํ…”, ์ง€์‹์‚ฐ์—…์„ผํ„ฐ ๋“ฑ์˜ ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ๊ณต๊ธ‰๋„ ์ฆ๊ฐ€ํ•ด ๊ณต์‹ค์˜ ์œ„ํ—˜๋„ ๋Š˜์—ˆ๋‹ค"๋ฉฐ "์‹ค์ œ ์˜ฌ 3๋ถ„๊ธฐ ์ „๊ตญ ์ค‘๋Œ€ํ˜• ์ƒ๊ฐ€ ๊ณต์‹ค๋ฅ ์€ 11.5%๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ 1๋ถ„๊ธฐ 11.3% ๋Œ€๋น„ 0.2% ํฌ์ธํŠธ ์ฆ๊ฐ€ํ–ˆ๋‹ค"๊ณ  ๋งํ–ˆ๋‹ค. ๊ทธ๋Š” "์ตœ๊ทผ ์†Œ์…œ์ปค๋จธ์Šค(SNS๋ฅผ ํ†ตํ•œ ์ „์ž์ƒ๊ฑฐ๋ž˜), ์Œ์‹ ๋ฐฐ๋‹ฌ ์ค‘๊ฐœ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜, ์ค‘๊ณ  ๋ฌผํ’ˆ ๊ฑฐ๋ž˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋“ฑ์˜ ์‚ฌ์šฉ ์ฆ๊ฐ€๋กœ ์˜คํ”„๋ผ์ธ ๋งค์žฅ์— ์˜ํ–ฅ์„ ๋ฏธ์ณค๋‹ค"๋ฉฐ "ํ–ฅํ›„ ์ง€์—ญ, ์ฝ˜ํ…์ธ ์— ๋”ฐ๋ฅธ ์ƒ๊ถŒ ์–‘๊ทนํ™” ํ˜„์ƒ์€ ์‹ฌํ™”๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค"๊ณ  ๋ง๋ถ™์˜€๋‹ค.'
input_ids = tokenizer.encode(input_text, return_tensors="pt")
# Generate Summary Text Ids
summary_text_ids = model.generate(
    input_ids=input_ids,
    bos_token_id=model.config.bos_token_id,
    eos_token_id=model.config.eos_token_id,
    length_penalty=2.0,
    max_length=142,
    min_length=56,
    num_beams=4,
)
# Decoding Text
print(tokenizer.decode(summary_text_ids[0], skip_special_tokens=True))
```
### API and Demo
You can experience this model through [ainize-api](https://ainize.ai/gkswjdzz/summarize-torchserve?branch=main) and [ainize-demo](https://main-summarize-torchserve-gkswjdzz.endpoint.ainize.ai/).