Instructions to use sahilchachra/hy-mt2-1.8b-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/hy-mt2-1.8b-8bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir hy-mt2-1.8b-8bit-mlx sahilchachra/hy-mt2-1.8b-8bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
hy-mt2-1.8b-8bit-mlx
Quantized version of tencent/Hy-MT2-1.8B for Apple Silicon using MLX.
Hy-MT2-1.8B is Tencent's multilingual translation model covering 40+ languages.
Quantization: Affine integer quantization
Precision: 8-bit (~8.5 bits/weight avg)
Group size: 64
Disk size: 1824 MB
Quantized by: sahilchachra
About this variant
Affine quantization at 8-bit with group size 64. Closest to FP16 translation quality. Recommended when memory allows and translation accuracy is the priority.
Benchmark results
Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.
Performance
| This model | FP16 baseline | |
|---|---|---|
| Prefill (tok/s) | 1345.5 | 1269.81 |
| Decode (tok/s) | 134.67 | 77.12 |
| Peak memory (GB) | 2.175 | 3.72 |
| Disk size (MB) | 1824 | 3897 |
Translation quality (FLORES-200 devtest)
Reported as chrF++ (higher is better). Sample-size noted per pair.
| Direction | This model | FP16 baseline | n |
|---|---|---|---|
| eng_Latn→fra_Latn | 65.33 | 63.81 | 20 |
| eng_Latn→deu_Latn | 57.38 | 57.66 | 20 |
| eng_Latn→zho_Hans | 28.84 | 29.09 | 20 |
| eng_Latn→jpn_Jpan | 34.42 | 34.19 | 20 |
| eng_Latn→spa_Latn | 56.44 | 56.5 | 20 |
| fra_Latn→eng_Latn | 65.73 | 64.58 | 20 |
| zho_Hans→eng_Latn | 55.19 | 55.17 | 20 |
| jpn_Jpan→eng_Latn | 54.93 | 55.29 | 20 |
Avg chrF++: 57.26 vs FP16 56.95
Avg BLEU: 31.33 vs FP16 30.71
Context scaling (decode tok/s)
| Context length | Decode tok/s |
|---|---|
| ~128 tokens | 84803.3 |
| ~256 tokens | 131.5 |
| ~512 tokens | 131.5 |
| ~1024 tokens | 124362.5 |
Usage
Install
pip install mlx-lm
Translate
from mlx_lm import load, generate
model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")
prompt = (
"Translate the following text from English to French.\n"
"English: The early bird catches the worm.\n"
"French:"
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))
Stream
from mlx_lm import load, stream_generate
model, tokenizer = load("sahilchachra/hy-mt2-1.8b-8bit-mlx")
for chunk in stream_generate(model, tokenizer, prompt="Translate \"Hello world\" to Japanese:", max_tokens=64):
print(chunk.text, end="", flush=True)
All variants in this collection
| Model | Method |
|---|---|
| sahilchachra/hy-mt2-1.8b-4bit-mlx | Affine int4 (group 64) |
| sahilchachra/hy-mt2-1.8b-8bit-mlx | Affine int8 (group 64) ← this model |
| sahilchachra/hy-mt2-1.8b-mxfp4-mlx | Block float MX FP4 |
| sahilchachra/hy-mt2-1.8b-mxfp8-mlx | Block float MX FP8 |
Notes
- Requires Apple Silicon (M1 or later) with MLX
- Benchmarks run on Apple M5 Pro, 24 GB unified memory
- FLORES-200 sample sizes are small — treat chrF/BLEU figures as indicative, not definitive
- License: see tencent/Hy-MT2-1.8B for the original model's license terms
Original model
See tencent/Hy-MT2-1.8B for full model details, supported languages, and intended use.
- Downloads last month
- 69
8-bit
Model tree for sahilchachra/hy-mt2-1.8b-8bit-mlx
Base model
tencent/Hy-MT2-1.8B