|
--- |
|
language: |
|
- ko |
|
tags: |
|
- generated_from_keras_callback |
|
model-index: |
|
- name: t5-large-korean-text-summary |
|
results: [] |
|
--- |
|
|
|
# t5-large-korean-text-summary |
|
|
|
This model is a fine-tuning of [paust/pko-t5-large](https://huggingface.co/paust/pko-t5-large) model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean. |
|
|
|
μ΄ λͺ¨λΈμ paust/pko-t5-large modelμ AIHUB "μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°"λ₯Ό μ΄μ©νμ¬ fine tunning ν κ²μ
λλ€. μ΄ λͺ¨λΈμ νκΈλ‘λ μ₯λ¬Έμ μ§§κ² μμ½ν΄ μ€λλ€. |
|
|
|
## Usage |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
import nltk |
|
nltk.download('punkt') |
|
|
|
model_dir = "lcw99/t5-large-korean-text-summary" |
|
tokenizer = AutoTokenizer.from_pretrained(model_dir) |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir) |
|
|
|
max_input_length = 512 + 256 |
|
|
|
text = """ |
|
μ£ΌμΈκ³΅ κ°μΈκ΅¬(νμ μ°)λ βμ리λ¨μμ νμ΄κ° λ§μ΄ λλλ° λ€ κ°λ€λ²λ¦°λ€βλ μΉκ΅¬ |
|
λ°μμ(νλ΄μ)μ μκΈ°λ₯Ό λ£κ³ μ리λ¨μ° νμ΄λ₯Ό νκ΅μ μμΆνκΈ° μν΄ μ리λ¨μΌλ‘ κ°λ€. |
|
κ΅λ¦½μμ°κ³Όνμ μΈ‘μ βμ€μ λ‘ λ¨λμμμ νμ΄κ° λ§μ΄ μ΄κ³ μλ₯΄ν¨ν°λλ₯Ό λΉλ‘―ν λ¨λ―Έ κ΅κ°μμ νμ΄κ° λ§μ΄ μ‘νλ€βλ©° |
|
βμλ¦¬λ¨ μ°μμλ νμ΄κ° λ§μ΄ μμν κ²βμ΄λΌκ³ μ€λͺ
νλ€. |
|
|
|
κ·Έλ¬λ κ΄μΈμ²μ λ°λ₯΄λ©΄ νκ΅μ μ리λ¨μ° νμ΄κ° μμ
λ μ μ μλ€. |
|
μΌκ°μμ βλμ λ²κΈ° μν΄ μ리λ¨μ° νμ΄λ₯Ό ꡬνλ¬ κ° μ€μ μ κ°μ°μ±μ΄ λ¨μ΄μ§λ€βλ μ§μ λ νλ€. |
|
λλΌλ§ λ°°κ²½μ΄ λ 2008~2010λ
μλ μ΄λ―Έ κ΅λ΄μ μλ₯΄ν¨ν°λ, μΉ λ , λ―Έκ΅ λ± μλ©λ¦¬μΉ΄μ° νμ΄κ° μμ
λκ³ μμκΈ° λλ¬Έμ΄λ€. |
|
μ€μ μ‘°λ΄ν μ²΄ν¬ μμ μ νμ‘°νλ βνλ ₯μ Kμ¨βλ νμ΄ μ¬μ
μ΄ μλλΌ μ리λ¨μ μ λ°μ© νΉμμ©μ λ΄μ νλ μ¬μ
μ νλ¬ μ리λ¨μ κ°μλ€. |
|
""" |
|
|
|
inputs = ["summarize: " + text] |
|
|
|
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt") |
|
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100) |
|
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0] |
|
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0] |
|
|
|
print(predicted_title) |
|
``` |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- optimizer: None |
|
- training_precision: float16 |
|
|
|
### Training results |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.22.1 |
|
- TensorFlow 2.10.0 |
|
- Datasets 2.5.1 |
|
- Tokenizers 0.12.1 |
|
|