File size: 2,737 Bytes
f9b0789 50a7a23 f9b0789 50a7a23 f9b0789 50a7a23 560ed27 50a7a23 f9b0789 50a7a23 f9b0789 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
tags:
- generated_from_keras_callback
model-index:
- name: t5-large-korean-text-summary
results: []
---
# t5-large-korean-text-summary
This model is a fine-tuning of [paust/pko-t5-large](https://huggingface.co/paust/pko-t5-large) model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.
μ΄ λͺ¨λΈμ paust/pko-t5-large modelμ AIHUB "μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°"λ₯Ό μ΄μ©νμ¬ fine tunning ν κ²μ
λλ€. μ΄ λͺ¨λΈμ νκΈλ‘λ μ₯λ¬Έμ μ§§κ² μμ½ν΄ μ€λλ€.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')
model_dir = "lcw99/t5-large-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
max_input_length = 512 + 256
text = """
μ£ΌμΈκ³΅ κ°μΈκ΅¬(νμ μ°)λ βμ리λ¨μμ νμ΄κ° λ§μ΄ λλλ° λ€ κ°λ€λ²λ¦°λ€βλ μΉκ΅¬
λ°μμ(νλ΄μ)μ μκΈ°λ₯Ό λ£κ³ μ리λ¨μ° νμ΄λ₯Ό νκ΅μ μμΆνκΈ° μν΄ μ리λ¨μΌλ‘ κ°λ€.
κ΅λ¦½μμ°κ³Όνμ μΈ‘μ βμ€μ λ‘ λ¨λμμμ νμ΄κ° λ§μ΄ μ΄κ³ μλ₯΄ν¨ν°λλ₯Ό λΉλ‘―ν λ¨λ―Έ κ΅κ°μμ νμ΄κ° λ§μ΄ μ‘νλ€βλ©°
βμλ¦¬λ¨ μ°μμλ νμ΄κ° λ§μ΄ μμν κ²βμ΄λΌκ³ μ€λͺ
νλ€.
κ·Έλ¬λ κ΄μΈμ²μ λ°λ₯΄λ©΄ νκ΅μ μ리λ¨μ° νμ΄κ° μμ
λ μ μ μλ€.
μΌκ°μμ βλμ λ²κΈ° μν΄ μ리λ¨μ° νμ΄λ₯Ό ꡬνλ¬ κ° μ€μ μ κ°μ°μ±μ΄ λ¨μ΄μ§λ€βλ μ§μ λ νλ€.
λλΌλ§ λ°°κ²½μ΄ λ 2008~2010λ
μλ μ΄λ―Έ κ΅λ΄μ μλ₯΄ν¨ν°λ, μΉ λ , λ―Έκ΅ λ± μλ©λ¦¬μΉ΄μ° νμ΄κ° μμ
λκ³ μμκΈ° λλ¬Έμ΄λ€.
μ€μ μ‘°λ΄ν μ²΄ν¬ μμ μ νμ‘°νλ βνλ ₯μ Kμ¨βλ νμ΄ μ¬μ
μ΄ μλλΌ μ리λ¨μ μ λ°μ© νΉμμ©μ λ΄μ νλ μ¬μ
μ νλ¬ μ리λ¨μ κ°μλ€.
"""
inputs = ["summarize: " + text]
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_title)
```
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16
### Training results
### Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1
|