metadata
language:
- ko
tags:
- generated_from_keras_callback
model-index:
- name: t5-large-korean-text-summary
results: []
t5-large-korean-text-summary
This model is a fine-tuning of paust/pko-t5-large model using AIHUB "summary and report generation data". This model provides a short summary of long sentences in Korean.
μ΄ λͺ¨λΈμ paust/pko-t5-large modelμ AIHUB "μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°"λ₯Ό μ΄μ©νμ¬ fine tunning ν κ²μ λλ€. μ΄ λͺ¨λΈμ νκΈλ‘λ μ₯λ¬Έμ μ§§κ² μμ½ν΄ μ€λλ€.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import nltk
nltk.download('punkt')
model_dir = "lcw99/t5-large-korean-text-summary"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
max_input_length = 512 + 256
text = """
μ£ΌμΈκ³΅ κ°μΈκ΅¬(νμ μ°)λ βμ리λ¨μμ νμ΄κ° λ§μ΄ λλλ° λ€ κ°λ€λ²λ¦°λ€βλ μΉκ΅¬
λ°μμ(νλ΄μ)μ μκΈ°λ₯Ό λ£κ³ μ리λ¨μ° νμ΄λ₯Ό νκ΅μ μμΆνκΈ° μν΄ μ리λ¨μΌλ‘ κ°λ€.
κ΅λ¦½μμ°κ³Όνμ μΈ‘μ βμ€μ λ‘ λ¨λμμμ νμ΄κ° λ§μ΄ μ΄κ³ μλ₯΄ν¨ν°λλ₯Ό λΉλ‘―ν λ¨λ―Έ κ΅κ°μμ νμ΄κ° λ§μ΄ μ‘νλ€βλ©°
βμλ¦¬λ¨ μ°μμλ νμ΄κ° λ§μ΄ μμν κ²βμ΄λΌκ³ μ€λͺ
νλ€.
κ·Έλ¬λ κ΄μΈμ²μ λ°λ₯΄λ©΄ νκ΅μ μ리λ¨μ° νμ΄κ° μμ
λ μ μ μλ€.
μΌκ°μμ βλμ λ²κΈ° μν΄ μ리λ¨μ° νμ΄λ₯Ό ꡬνλ¬ κ° μ€μ μ κ°μ°μ±μ΄ λ¨μ΄μ§λ€βλ μ§μ λ νλ€.
λλΌλ§ λ°°κ²½μ΄ λ 2008~2010λ
μλ μ΄λ―Έ κ΅λ΄μ μλ₯΄ν¨ν°λ, μΉ λ , λ―Έκ΅ λ± μλ©λ¦¬μΉ΄μ° νμ΄κ° μμ
λκ³ μμκΈ° λλ¬Έμ΄λ€.
μ€μ μ‘°λ΄ν μ²΄ν¬ μμ μ νμ‘°νλ βνλ ₯μ Kμ¨βλ νμ΄ μ¬μ
μ΄ μλλΌ μ리λ¨μ μ λ°μ© νΉμμ©μ λ΄μ νλ μ¬μ
μ νλ¬ μ리λ¨μ κ°μλ€.
"""
inputs = ["summarize: " + text]
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
print(predicted_title)
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: None
- training_precision: float16
Training results
Framework versions
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.5.1
- Tokenizers 0.12.1