t5-base-korean-summarization
This is T5 model for korean text summarization.
Finetuned based on 'paust/pko-t5-base' model.
Finetuned with 3 datasets. Specifically, it is described below.
Usage (HuggingFace Transformers)
import nltk
nltk.download('punkt')
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('eenzeenee/t5-base-korean-summarization')
tokenizer = AutoTokenizer.from_pretrained('eenzeenee/t5-base-korean-summarization')
prefix = "summarize: "
sample = """
μλ
νμΈμ? μ°λ¦¬ (2νλ
)/(μ΄ νλ
) μΉκ΅¬λ€ μ°λ¦¬ μΉκ΅¬λ€ νκ΅μ κ°μ μ§μ§ (2νλ
)/(μ΄ νλ
) μ΄ λκ³ μΆμλλ° νκ΅μ λͺ» κ°κ³ μμ΄μ λ΅λ΅νμ£ ?
κ·Έλλ μ°λ¦¬ μΉκ΅¬λ€μ μμ κ³Ό 건κ°μ΄ μ΅μ°μ μ΄λκΉμ μ€λλΆν° μ μλμ΄λ λ§€μΌ λ§€μΌ κ΅μ΄ μ¬νμ λ λ보λλ‘ ν΄μ.
μ΄/ μκ°μ΄ λ²μ¨ μ΄λ κ² λλμ? λ¦μμ΄μ. λ¦μμ΄μ. 빨리 κ΅μ΄ μ¬νμ λ λμΌ λΌμ.
κ·Έλ°λ° μ΄/ κ΅μ΄μ¬νμ λ λκΈ° μ μ μ°λ¦¬κ° μ€λΉλ¬Όμ μ±κ²¨μΌ λκ² μ£ ? κ΅μ΄ μ¬νμ λ λ μ€λΉλ¬Ό, κ΅μμ μ΄λ»κ² λ°μ μ μλμ§ μ μλμ΄ μ€λͺ
μ ν΄μ€κ²μ.
(EBS)/(μ΄λΉμμ€) μ΄λ±μ κ²μν΄μ λ€μ΄κ°λ©΄μ 첫νλ©΄μ΄ μ΄λ κ² λμμ.
μ/ κ·Έλ¬λ©΄μ μ¬κΈ° (X)/(μμ€) λλ¬μ£Ό(κ³ μ)/(ꡬμ). μ κΈ° (λκ·ΈλΌλ―Έ)/(λ₯κ·ΈλΌλ―Έ) (EBS)/(μ΄λΉμμ€) (2μ£Ό)/(μ΄ μ£Ό) λΌμ΄λΈνΉκ°μ΄λΌκ³ λμ΄μμ£ ?
κ±°κΈ°λ₯Ό λ°λ‘ κ°κΈ°λ₯Ό λλ¦
λλ€. μ/ (λλ₯΄λ©΄μ)/(λλ₯΄λ©΄μ). μ΄λ»κ² λλ? b/ λ°μΌλ‘ λ΄λ €μ λ΄λ €μ λ΄λ €μ μ λ΄λ €μ.
μ°λ¦¬ λͺ νλ
μ΄μ£ ? μ/ (2νλ
)/(μ΄ νλ
) μ΄μ£ (2νλ
)/(μ΄ νλ
)μ λ¬΄μ¨ κ³Όλͺ©? κ΅μ΄.
μ΄λ²μ£Όλ (1μ£Ό)/(μΌ μ£Ό) μ°¨λκΉμ μ¬κΈ° κ΅μ. λ€μμ£Όλ μ¬κΈ°μ λ€μ΄μ λ°μΌλ©΄ λΌμ.
μ΄ κ΅μμ ν΄λ¦μ νλ©΄, μ§μ/. μ΄λ κ² κ΅μ¬κ° λμ΅λλ€ .μ΄ κ΅μμ (λ€μ΄)/(λ°μ΄)λ°μμ μ°λ¦¬ κ΅μ΄μ¬νμ λ λ μκ° μμ΄μ.
κ·ΈλΌ μ°λ¦¬ μ§μ§λ‘ κ΅μ΄ μ¬νμ νλ² λ λ보λλ‘ ν΄μ? κ΅μ΄μ¬ν μΆλ°. μ/ (1λ¨μ)/(μΌ λ¨μ) μ λͺ©μ΄ λκ°μ? νλ² μ°Ύμλ΄μ.
μλ₯Ό μ¦κ²¨μ μμ. κ·Έλ₯ μλ₯Ό μ½μ΄μ κ° μλμμ. μλ₯Ό μ¦κ²¨μΌ λΌμ μ¦κ²¨μΌ λΌ. μ΄λ»κ² μ¦κΈΈκΉ? μΌλ¨μ λ΄λ΄ μλ₯Ό μ¦κΈ°λ λ°©λ²μ λν΄μ 곡λΆλ₯Ό ν 건λ°μ.
κ·ΈλΌ μ€λμμ μ΄λ»κ² μ¦κΈΈκΉμ? μ€λ 곡λΆν λ΄μ©μμ μλ₯Ό μ¬λ¬ κ°μ§ λ°©λ²μΌλ‘ μ½κΈ°λ₯Ό 곡λΆν κ²λλ€.
μ΄λ»κ² μ¬λ¬κ°μ§ λ°©λ²μΌλ‘ μ½μκΉ μ°λ¦¬ 곡λΆν΄ 보λλ‘ ν΄μ. μ€λμ μ λμλΌ μ§μ/! μκ° λμμ΅λλ€ μμ μ λͺ©μ΄ λκ°μ? λ€ν° λ μ΄μμ λ€ν° λ .
λꡬλ λ€νλ λμμ΄λ λ€νλ μΈλλ μΉκ΅¬λ? λꡬλ λ€νλμ§ μ μλμ΄ μλ₯Ό μ½μ΄ μ€ ν
λκΉ νλ² μκ°μ ν΄λ³΄λλ‘ ν΄μ."""
inputs = [prefix + sample]
inputs = tokenizer(inputs, max_length=512, truncation=True, return_tensors="pt")
output = model.generate(**inputs, num_beams=3, do_sample=True, min_length=10, max_length=64)
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
result = nltk.sent_tokenize(decoded_output.strip())[0]
print('RESULT >>', result)
RESULT >> κ΅μ΄ μ¬νμ λ λκΈ° μ μ κ΅μ΄ μ¬νμ λ λ μ€λΉλ¬Όκ³Ό κ΅μμ μ΄λ»κ² λ°μ μ μλμ§ μ μλμ΄ μ€λͺ
ν΄ μ€λ€.
Evalutation Result
- Korean Paper Summarization Dataset(λ
Όλ¬Έμλ£ μμ½)
ROUGE-2-R 0.09868624890432466 ROUGE-2-P 0.9666714545849712 ROUGE-2-F 0.17250881441169427
- Korean Book Summarization Dataset(λμμλ£ μμ½)
ROUGE-2-R 0.1575686156943213 ROUGE-2-P 0.9718318136896944 ROUGE-2-F 0.26548116834852586
- Korean Summary statement and Report Generation Dataset(μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°)
ROUGE-2-R 0.0987891733555808 ROUGE-2-P 0.9276946867981899 ROUGE-2-F 0.17726493110448185
Training
The model was trained with the parameters:
- training arguments
Seq2SeqTrainingArguments(
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
auto_find_batch_size=False,
weight_decay=0.01,
learning_rate=4e-05,
lr_scheduler_type=linear,
num_train_epochs=3,
fp16=True)
Model Architecture
T5ForConditionalGeneration(
(shared): Embedding(50358, 768)
(encoder): T5Stack(
(embed_tokens): Embedding(50358, 768)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias): Embedding(32, 12)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1~11): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(decoder): T5Stack(
(embed_tokens): Embedding(50358, 768)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
(relative_attention_bias): Embedding(32, 12)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1~11): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=768, out_features=768, bias=False)
(k): Linear(in_features=768, out_features=768, bias=False)
(v): Linear(in_features=768, out_features=768, bias=False)
(o): Linear(in_features=768, out_features=768, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=768, out_features=2048, bias=False)
(wi_1): Linear(in_features=768, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=768, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(lm_head): Linear(in_features=768, out_features=50358, bias=False)
)
Citation
- Raffel, Colin, et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020): 1-67.
- Downloads last month
- 1,222
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.