LongT5 Malayalam Summarizer

Fine-tuned google/long-t5-tglobal-base for Malayalam text summarization. Supports input documents up to 4096 tokens, making it suitable for long-form Malayalam articles and documents.

Model Details

Property	Value
Base model	`google/long-t5-tglobal-base`
Architecture	LongT5 (transient-global attention)
Max input tokens	4096
Max output tokens	256
Language	Malayalam (`ml`)
Task	Abstractive summarization

Training

Dataset: Navneeth017/ml-summarizer-translated-chunks
- Train: 127,183 examples
- Validation: 1,285 examples
Epochs: 2 (31,796 steps)
Effective batch size: 8 (1 per device × 8 gradient accumulation)
Learning rate: 5e-5 with 500 warmup steps
Optimizer: Adafactor
Precision: BF16
Weight decay: 0.01

Training Metrics

Metric	Value
Training loss	3.185
Eval loss (final)	0.368
Best eval loss	0.3684 @ step 30,000

Checkpoints

The weights in this repo root are from the final step (31,796). The best checkpoint by eval loss is saved at checkpoint-30000/ (step 30,000, epoch 1.89, eval loss 0.3684) and can be loaded directly:

model_id = "Navneeth017/longt5-malayalam-summarizer"

tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/checkpoint-30000")
model = AutoModelForSeq2SeqLM.from_pretrained(f"{model_id}/checkpoint-30000")

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "Navneeth017/longt5-malayalam-summarizer"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

text = "your malayalam document here..."

inputs = tokenizer(text, return_tensors="pt", max_length=4096, truncation=True)
summary_ids = model.generate(
    inputs["input_ids"],
    max_new_tokens=256,
    num_beams=4,
    early_stopping=True,
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

Limitations

Optimized for Malayalam; performance on other languages is not guaranteed.
Abstractive outputs may occasionally hallucinate facts not present in the source.
Output quality depends on document length and domain similarity to the training data.

Downloads last month: 89

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for Navneeth017/longt5-malayalam-summarizer

Base model

google/long-t5-tglobal-base

Finetuned

(39)

this model