LongT5 Malayalam Summarizer

Fine-tuned google/long-t5-tglobal-base for Malayalam text summarization. Supports input documents up to 4096 tokens, making it suitable for long-form Malayalam articles and documents.

Model Details

Property Value
Base model google/long-t5-tglobal-base
Architecture LongT5 (transient-global attention)
Max input tokens 4096
Max output tokens 256
Language Malayalam (ml)
Task Abstractive summarization

Training

  • Dataset: Navneeth017/ml-summarizer-translated-chunks
    • Train: 127,183 examples
    • Validation: 1,285 examples
  • Epochs: 2 (31,796 steps)
  • Effective batch size: 8 (1 per device × 8 gradient accumulation)
  • Learning rate: 5e-5 with 500 warmup steps
  • Optimizer: Adafactor
  • Precision: BF16
  • Weight decay: 0.01

Training Metrics

Metric Value
Training loss 3.185
Eval loss (final) 0.368
Best eval loss 0.3684 @ step 30,000

Checkpoints

The weights in this repo root are from the final step (31,796). The best checkpoint by eval loss is saved at checkpoint-30000/ (step 30,000, epoch 1.89, eval loss 0.3684) and can be loaded directly:

model_id = "Navneeth017/longt5-malayalam-summarizer"

tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/checkpoint-30000")
model = AutoModelForSeq2SeqLM.from_pretrained(f"{model_id}/checkpoint-30000")

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "Navneeth017/longt5-malayalam-summarizer"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

text = "your malayalam document here..."

inputs = tokenizer(text, return_tensors="pt", max_length=4096, truncation=True)
summary_ids = model.generate(
    inputs["input_ids"],
    max_new_tokens=256,
    num_beams=4,
    early_stopping=True,
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

Limitations

  • Optimized for Malayalam; performance on other languages is not guaranteed.
  • Abstractive outputs may occasionally hallucinate facts not present in the source.
  • Output quality depends on document length and domain similarity to the training data.
Downloads last month
89
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Navneeth017/longt5-malayalam-summarizer

Finetuned
(39)
this model