You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

NE-Trans v0 — 600M

MWire Labs | Kren Stack

Baseline validation checkpoint for NE-Trans, a multilingual MT system for Northeast Indian languages. Built on NLLB-200-distilled-600M with LoRA fine-tuning.

Languages

Bodo (brx_Deva) — existing NLLB token
Kokborok (trp_Latn) — new token added
Khasi (kha_Latn) — new token added

Training

Data: WMT 2025 official training data only (no augmentation)
Bodo: 12,129 pairs | Kokborok: 2,152 pairs | Khasi: 24,699 pairs
Both directions (en→X and X→en) jointly trained
LoRA r=16, alpha=32, target: q_proj/v_proj
10 epochs, batch size 32, lr 5e-4

Val BLEU (WMT 2025 val split, no augmentation)

Direction	BLEU	2025 Best
en→bodo	21.91	19.71
bodo→en	32.19	21.68
en→kokborok	5.14	6.90
kokborok→en	10.06	2.99
en→khasi	24.88	10.81
khasi→en	24.99	14.26

Notes

This is a baseline validation run — no back-translation, no QE filtering, no reranker
en→kokborok underperforms due to extremely limited data (2,152 pairs) and randomly initialized trp_Latn token
Full NE-Trans system (3.3B, augmented data, CPT reranker) to follow for WMT 2026 submission
Part of the Kren Stack: NE-LID, NE-BERT, NE-ASR, NE-OCR, NE-CLIP, NE-Trans

Citation

If you use this model, please cite MWire Labs and WMT 2026 NE-Trans system paper (forthcoming).

Downloads last month: 29

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support