Edit model card

mt5-base-thaisum

This repository contains the finetuned mT5-base model for Thai sentence summarization. The architecture of the model is based on mT5 model and fine-tuned on text-summarization pairs in Thai.

Example

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

tokenizer = AutoTokenizer.from_pretrained("preechanon/mt5-base-thaisum-text-summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("preechanon/mt5-base-thaisum-text-summarization")
new_input_string = "ข้อความที่ต้องการ"
input_ = tokenizer(new_input_string, truncation=True, max_length=1024, return_tensors="pt")
with torch.no_grad():
    preds = model.generate(
        input_['input_ids'].to('cpu'),
        num_beams=15,
        num_return_sequences=1,
        no_repeat_ngram_size=1,
        remove_invalid_values=True,
        max_length=140,
    )

summary = tokenizer.decode(preds[0], skip_special_tokens=True)
summary

Score

  • Rouge1: 0.488931
  • Rouge2: 0.309732
  • Rougel: 0.425490
  • Rougelsum: 0.444359

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-04
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999), epsilon=1e-08 and weight_decay=0.1
  • warmup step: 5000
  • lr_scheduler_type: linear
  • num_epochs: 6
  • gradient_accumulation_steps: 4

Framework versions

  • Transformers 4.36.1
  • Pytorch 2.1.2

Resource Funding

NSTDA Supercomputer center (ThaiSC) and the National e-Science Infrastructure Consortium for their support of computer facilities.

Downloads last month
212
Safetensors
Model size
582M params
Tensor type
F32
·