Sinhala E2R mT5

Fine-tuned mT5-base for Sinhala Easy-to-Read (E2R) text simplification. Stage 2 of a two-stage pipeline:

Input text
  ↓ Stage 1 — bert-base-multilingual-cased (Complex Word ID via MLM masking)
  ↓ Stage 2 — {HF_MODEL_REPO}  (structural simplification)
  ↓ E2R output

Training

Dataset : 800 Sinhala complex→simple sentence pairs
Best val_loss : 0.9057
E2R compliance : ~72.7% → ~74.1%

Downloads last month: 39

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DineshaPriyadarshani/sinhala-e2r-mt5

Base model

google/mt5-base

Finetuned

(304)

this model