ControlMT v2.3 — int8 dynamic (CPU)

CPU-optimized int8 dynamic-quantized variant of anandkaman/controlmt-v2.3.

Auto-applies torch.quantization.quantize_dynamic to every nn.Linear at load time. You don't need to write any quantization code — just load it the standard HF way.

Performance (RTX 5060 Ti box, 6-pair test, beam=2)

Variant Latency / pair RAM vs CPU bf16
int8 (this) 0.28 s ~140 MB 1.8× faster
CPU bf16 0.51 s 280 MB (baseline)
CPU fp32 1.44 s 560 MB 2.8× slower

Quality: identical output on our test set vs fp32. In production, re-validate on your own representative sentences — int8 dynamic occasionally drops 0.5–1 BLEU on long-tail outputs.

Quick start

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("anandkaman/controlmt-v2.3-int8", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("anandkaman/controlmt-v2.3-int8", trust_remote_code=True)

# Already quantized — just translate
print(model.translate("ನಾನು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇನೆ.",
                       tokenizer=tokenizer, direction="kn2en"))
# → "I speak Kannada."

That's it. No quantize_dynamic call needed; the modeling code does it for you on from_pretrained.

Or use the SDK (one-liner, also handles pip install):

pip install controlmt
from controlmt import ControlMT
model = ControlMT.from_hf(model_id="anandkaman/controlmt-v2.3-int8", quant="int8")

Hardware

  • ✅ CPU (x86 + ARM, ≥1 GB RAM)
  • ❌ GPU — quantization is CPU-only; calling .to('cuda') reverts the int8 ops to fp32 Linear and you lose the speed/memory win. Use the main repo with dtype=torch.float16 for GPU.

Other variants in the family

Repo Best for
controlmt-v2.3 General use — fp32 / bf16 / fp16 chosen at load
controlmt-v2.3-int8 (you are here) CPU-only, memory-constrained, fastest CPU
controlmt-demo Live web demo

License

Apache 2.0. Same as the base model.

Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anandkaman/controlmt-v2.3-int8

Finetuned
(1)
this model