hy-mt2-1.8b-8bit-mlx

Quantized version of tencent/Hy-MT2-1.8B for Apple Silicon using MLX.

Hy-MT2-1.8B is Tencent's multilingual translation model covering 40+ languages.

Quantization: Affine integer quantization
Precision: 8-bit (~8.5 bits/weight avg)
Group size: 64
Disk size: 1824 MB
Quantized by: sahilchachra

About this variant

Affine quantization at 8-bit with group size 64. Closest to FP16 translation quality. Recommended when memory allows and translation accuracy is the priority.

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Prefill (tok/s)	1345.5	1269.81
Decode (tok/s)	134.67	77.12
Peak memory (GB)	2.175	3.72
Disk size (MB)	1824	3897

Translation quality (FLORES-200 devtest)

Reported as chrF++ (higher is better). Sample-size noted per pair.

Direction	This model	FP16 baseline	n
eng_Latn→fra_Latn	65.33	63.81	20
eng_Latn→deu_Latn	57.38	57.66	20
eng_Latn→zho_Hans	28.84	29.09	20
eng_Latn→jpn_Jpan	34.42	34.19	20
eng_Latn→spa_Latn	56.44	56.5	20
fra_Latn→eng_Latn	65.73	64.58	20
zho_Hans→eng_Latn	55.19	55.17	20
jpn_Jpan→eng_Latn	54.93	55.29	20

Avg chrF++: 57.26 vs FP16 56.95
Avg BLEU: 31.33 vs FP16 30.71

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	84803.3
~256 tokens	131.5
~512 tokens	131.5
~1024 tokens	124362.5

Usage

Install

pip install mlx-lm

Translate

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")

prompt = (
    "Translate the following text from English to French.\n"
    "English: The early bird catches the worm.\n"
    "French:"
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))

Stream

from mlx_lm import load, stream_generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")
for chunk in stream_generate(model, tokenizer, prompt="Translate \"Hello world\" to Japanese:", max_tokens=64):
    print(chunk.text, end="", flush=True)

All variants in this collection

Model	Method
sahilchachra/hy-mt2-1.8b-4bit-mlx	Affine int4 (group 64)
sahilchachra/hy-mt2-1.8b-8bit-mlx	Affine int8 (group 64) ← this model
sahilchachra/hy-mt2-1.8b-mxfp4-mlx	Block float MX FP4
sahilchachra/hy-mt2-1.8b-mxfp8-mlx	Block float MX FP8

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
FLORES-200 sample sizes are small — treat chrF/BLEU figures as indicative, not definitive
License: see tencent/Hy-MT2-1.8B for the original model's license terms

Original model

See tencent/Hy-MT2-1.8B for full model details, supported languages, and intended use.

Downloads last month: 69

Safetensors

Model size

0.5B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for sahilchachra/hy-mt2-1.8b-8bit-mlx

Base model

tencent/Hy-MT2-1.8B

Quantized

(22)

this model