Laguna-XS-2.1 (MLX, 6bit)

Converted from poolside/Laguna-XS-2.1 to MLX format, quantized to 6 bits (group size 64, 6.501 bpw effective).

Notes

  • Works with mlx-vlm and oMLX (forcing the model's vlm mode). mlx-lm doesn't support the laguna architecture yet — there's an open PR: mlx-lm#1223.
  • Sometimes I got an empty </think> tag at the start of responses, which isn't that common. It won't affect anything tho.

Performance

Measured with oMLX's benchmark harness on a Macbook Pro M5 Max 128GB 40 GPU (single request, 128 generated tokens):

prompt gen tok/s prefill tok/s TTFT ms peak GB
1k 102.9 3552 289 26.0
4k 101.3 3862 1061 26.5
8k 97.3 3497 2343 26.6
16k 91.5 2958 5539 26.9
32k 80.9 2369 13836 27.6

Variants

Variant bpw Disk gen tok/s (1k → 32k)
bf16 16 62 GB 70.6 → 58.7
8bit 8.500 33 GB 95.4 → 76.7
6bit (this repo) 6.501 25 GB 102.9 → 80.9
5bit 5.502 21 GB 115.9 → 87.7
4bit 4.503 18 GB 126.0 → 91.3
3bit 3.503 14 GB 137.2 → 98.8

Usage

uvx --from mlx-vlm mlx_vlm.generate --model mlx-community/Laguna-XS-2.1-6bit --prompt "..." --max-tokens 300

License

OpenMDW-1.1, inherited from the base model.

Downloads last month
104
Safetensors
Model size
7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Laguna-XS-2.1-6bit

Quantized
(14)
this model

Collection including mlx-community/Laguna-XS-2.1-6bit