ControlMT v2.3 — int8 dynamic (CPU)

CPU-optimized int8 dynamic-quantized variant of anandkaman/controlmt-v2.3.

Auto-applies torch.quantization.quantize_dynamic to every nn.Linear at load time. You don't need to write any quantization code — just load it the standard HF way.

Performance (RTX 5060 Ti box, 6-pair test, beam=2)

Variant	Latency / pair	RAM	vs CPU bf16
int8 (this)	0.28 s	~140 MB	1.8× faster
CPU bf16	0.51 s	280 MB	(baseline)
CPU fp32	1.44 s	560 MB	2.8× slower

Quality: identical output on our test set vs fp32. In production, re-validate on your own representative sentences — int8 dynamic occasionally drops 0.5–1 BLEU on long-tail outputs.

Quick start

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("anandkaman/controlmt-v2.3-int8", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("anandkaman/controlmt-v2.3-int8", trust_remote_code=True)

# Already quantized — just translate
print(model.translate("ನಾನು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇನೆ.",
                       tokenizer=tokenizer, direction="kn2en"))
# → "I speak Kannada."

That's it. No quantize_dynamic call needed; the modeling code does it for you on from_pretrained.

Or use the SDK (one-liner, also handles pip install):

pip install controlmt

from controlmt import ControlMT
model = ControlMT.from_hf(model_id="anandkaman/controlmt-v2.3-int8", quant="int8")

Hardware

✅ CPU (x86 + ARM, ≥1 GB RAM)
❌ GPU — quantization is CPU-only; calling .to('cuda') reverts the int8 ops to fp32 Linear and you lose the speed/memory win. Use the main repo with dtype=torch.float16 for GPU.

Other variants in the family

Repo	Best for
controlmt-v2.3	General use — fp32 / bf16 / fp16 chosen at load
controlmt-v2.3-int8 (you are here)	CPU-only, memory-constrained, fastest CPU
controlmt-demo	Live web demo

License

Apache 2.0. Same as the base model.

Downloads last month: 19

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for anandkaman/controlmt-v2.3-int8

Base model

anandkaman/controlmt-v2.3

Finetuned

(1)

this model