LongT5 Malayalam Summarizer
Fine-tuned google/long-t5-tglobal-base for Malayalam text summarization. Supports input documents up to 4096 tokens, making it suitable for long-form Malayalam articles and documents.
Model Details
| Property | Value |
|---|---|
| Base model | google/long-t5-tglobal-base |
| Architecture | LongT5 (transient-global attention) |
| Max input tokens | 4096 |
| Max output tokens | 256 |
| Language | Malayalam (ml) |
| Task | Abstractive summarization |
Training
- Dataset:
Navneeth017/ml-summarizer-translated-chunks- Train: 127,183 examples
- Validation: 1,285 examples
- Epochs: 2 (31,796 steps)
- Effective batch size: 8 (1 per device × 8 gradient accumulation)
- Learning rate: 5e-5 with 500 warmup steps
- Optimizer: Adafactor
- Precision: BF16
- Weight decay: 0.01
Training Metrics
| Metric | Value |
|---|---|
| Training loss | 3.185 |
| Eval loss (final) | 0.368 |
| Best eval loss | 0.3684 @ step 30,000 |
Checkpoints
The weights in this repo root are from the final step (31,796). The best checkpoint by eval loss is saved at checkpoint-30000/ (step 30,000, epoch 1.89, eval loss 0.3684) and can be loaded directly:
model_id = "Navneeth017/longt5-malayalam-summarizer"
tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/checkpoint-30000")
model = AutoModelForSeq2SeqLM.from_pretrained(f"{model_id}/checkpoint-30000")
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "Navneeth017/longt5-malayalam-summarizer"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
text = "your malayalam document here..."
inputs = tokenizer(text, return_tensors="pt", max_length=4096, truncation=True)
summary_ids = model.generate(
inputs["input_ids"],
max_new_tokens=256,
num_beams=4,
early_stopping=True,
)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)
Limitations
- Optimized for Malayalam; performance on other languages is not guaranteed.
- Abstractive outputs may occasionally hallucinate facts not present in the source.
- Output quality depends on document length and domain similarity to the training data.
- Downloads last month
- 89
Model tree for Navneeth017/longt5-malayalam-summarizer
Base model
google/long-t5-tglobal-base