hy-mt2-1.8b-8bit-mlx

Quantized version of tencent/Hy-MT2-1.8B for Apple Silicon using MLX.

Hy-MT2-1.8B is Tencent's multilingual translation model covering 40+ languages.

Quantization: Affine integer quantization
Precision: 8-bit (~8.5 bits/weight avg)
Group size: 64
Disk size: 1824 MB
Quantized by: sahilchachra

About this variant

Affine quantization at 8-bit with group size 64. Closest to FP16 translation quality. Recommended when memory allows and translation accuracy is the priority.

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

This model FP16 baseline
Prefill (tok/s) 1345.5 1269.81
Decode (tok/s) 134.67 77.12
Peak memory (GB) 2.175 3.72
Disk size (MB) 1824 3897

Translation quality (FLORES-200 devtest)

Reported as chrF++ (higher is better). Sample-size noted per pair.

Direction This model FP16 baseline n
eng_Latn→fra_Latn 65.33 63.81 20
eng_Latn→deu_Latn 57.38 57.66 20
eng_Latn→zho_Hans 28.84 29.09 20
eng_Latn→jpn_Jpan 34.42 34.19 20
eng_Latn→spa_Latn 56.44 56.5 20
fra_Latn→eng_Latn 65.73 64.58 20
zho_Hans→eng_Latn 55.19 55.17 20
jpn_Jpan→eng_Latn 54.93 55.29 20

Avg chrF++: 57.26 vs FP16 56.95
Avg BLEU: 31.33 vs FP16 30.71

Context scaling (decode tok/s)

Context length Decode tok/s
~128 tokens 84803.3
~256 tokens 131.5
~512 tokens 131.5
~1024 tokens 124362.5

Usage

Install

pip install mlx-lm

Translate

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")

prompt = (
    "Translate the following text from English to French.\n"
    "English: The early bird catches the worm.\n"
    "French:"
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))

Stream

from mlx_lm import load, stream_generate

model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")
for chunk in stream_generate(model, tokenizer, prompt="Translate \"Hello world\" to Japanese:", max_tokens=64):
    print(chunk.text, end="", flush=True)

All variants in this collection

Model Method
sahilchachra/hy-mt2-1.8b-4bit-mlx Affine int4 (group 64)
sahilchachra/hy-mt2-1.8b-8bit-mlx Affine int8 (group 64) ← this model
sahilchachra/hy-mt2-1.8b-mxfp4-mlx Block float MX FP4
sahilchachra/hy-mt2-1.8b-mxfp8-mlx Block float MX FP8

Notes

  • Requires Apple Silicon (M1 or later) with MLX
  • Benchmarks run on Apple M5 Pro, 24 GB unified memory
  • FLORES-200 sample sizes are small — treat chrF/BLEU figures as indicative, not definitive
  • License: see tencent/Hy-MT2-1.8B for the original model's license terms

Original model

See tencent/Hy-MT2-1.8B for full model details, supported languages, and intended use.

Downloads last month
69
Safetensors
Model size
0.5B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/hy-mt2-1.8b-8bit-mlx

Quantized
(22)
this model