CA byte-LM translator β x2en
A vocabulary-free, UTF-8-byte, decoder-only translator with a causal Neural-Cellular-Automaton
front-end, fine-tuned (prompted continuation, prefix-LM, target-span loss) from the pretrained base
sujayrittikar/ca-byte-lm-indic6 for the
x2en direction. WMT-2026 Indic MT research track. Covers Assamese, Khasi, Manipuri,
Meitei-Mayek, Mizo, Nyishi β English β and is the only system for Khasi, Nyishi and Meitei-Mayek.
Dev chrF++ (best checkpoint @ step 60000)
| language | chrF++ |
|---|---|
| Assamese | 21.98 |
| Manipuri | 27.66 |
| Mizo | 36.73 |
| Khasi | 26.43 |
| Nyishi | 76.03 |
Dev numbers are measured on the 2025 test set, which was folded into training β indicative, not held-out.
Usage
from ca_byte_lm import from_hub, translate
model, cfg, meta = from_hub("sujayrittikar/ca-byte-mt-x2en", device="cuda")
print(translate(model, cfg, meta, "Ka sorkar ka la pynbna ia ka jingiaseng thymmai.", "Khasi", "English", device="cuda"))
Weights: ca_byte_lm.pt; architecture: ca_byte_lm.py; config: config.json.
- Downloads last month
- 17
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support