lcw99's picture
Update README.md
74068ab
---
language:
- ko
tags:
- generated_from_keras_callback
model-index:
- name: t5-base-korean-text-summary
results: []
---
# t5-base-korean-text-summary
This model is a fine-tuning of [paust/pko-t5-base](https://huggingface.co/paust/pko-t5-base) model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.
이 λͺ¨λΈμ€ paust/pko-t5-base model을 AIHUB "μš”μ•½λ¬Έ 및 레포트 생성 데이터"λ₯Ό μ΄μš©ν•˜μ—¬ fine tunning ν•œ κ²ƒμž…λ‹ˆλ‹€. 이 λͺ¨λΈμ€ ν•œκΈ€λ‘œλœ μž₯문을 짧게 μš”μ•½ν•΄ μ€λ‹ˆλ‹€.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')
model_dir = "lcw99/t5-base-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
max_input_length = 512
text = """
주인곡 강인ꡬ(ν•˜μ •μš°)λŠ” β€˜μˆ˜λ¦¬λ‚¨μ—μ„œ 홍어가 많이 λ‚˜λŠ”λ° λ‹€ κ°–λ‹€λ²„λ¦°λ‹€β€™λŠ” 친ꡬ
λ°•μ‘μˆ˜(ν˜„λ΄‰μ‹)의 μ–˜κΈ°λ₯Ό λ“£κ³  μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό ν•œκ΅­μ— μˆ˜μΆœν•˜κΈ° μœ„ν•΄ μˆ˜λ¦¬λ‚¨μœΌλ‘œ κ°„λ‹€.
κ΅­λ¦½μˆ˜μ‚°κ³Όν•™μ› 츑은 β€œμ‹€μ œλ‘œ λ‚¨λŒ€μ„œμ–‘μ— 홍어가 많이 μ‚΄κ³  μ•„λ₯΄ν—¨ν‹°λ‚˜λ₯Ό λΉ„λ‘―ν•œ 남미 κ΅­κ°€μ—μ„œ 홍어가 많이 μž‘νžŒλ‹€β€λ©°
β€œμˆ˜λ¦¬λ‚¨ μ—°μ•ˆμ—λ„ 홍어가 많이 μ„œμ‹ν•  것”이라고 μ„€λͺ…ν–ˆλ‹€.
κ·ΈλŸ¬λ‚˜ 관세청에 λ”°λ₯΄λ©΄ ν•œκ΅­μ— μˆ˜λ¦¬λ‚¨μ‚° 홍어가 μˆ˜μž…λœ 적은 μ—†λ‹€.
일각에선 β€œλˆμ„ 벌기 μœ„ν•΄ μˆ˜λ¦¬λ‚¨μ‚° 홍어λ₯Ό κ΅¬ν•˜λŸ¬ κ°„ 섀정은 κ°œμ—°μ„±μ΄ λ–¨μ–΄μ§„λ‹€β€λŠ” 지적도 ν•œλ‹€.
λ“œλΌλ§ˆ 배경이 된 2008~2010λ…„μ—λŠ” 이미 ꡭ내에 μ•„λ₯΄ν—¨ν‹°λ‚˜, 칠레, λ―Έκ΅­ λ“± 아메리카산 홍어가 μˆ˜μž…λ˜κ³  μžˆμ—ˆκΈ° λ•Œλ¬Έμ΄λ‹€.
μ‹€μ œ 쑰봉행 체포 μž‘μ „μ— ν˜‘μ‘°ν–ˆλ˜ β€˜ν˜‘λ ₯자 K씨’도 홍어 사업이 μ•„λ‹ˆλΌ μˆ˜λ¦¬λ‚¨μ— μ„ λ°•μš© νŠΉμˆ˜μš©μ ‘λ΄‰μ„ νŒŒλŠ” 사업을 ν•˜λŸ¬ μˆ˜λ¦¬λ‚¨μ— κ°”μ—ˆλ‹€.
"""
inputs = ["summarize: " + text]
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_title)
```
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16
### Training results
### Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1