An MLX variant of IBM's Granite 4.0 H Tiny, a Mixture of Experts model which consists of 7 billion total parameters and 1 billion active. The raw model weights were converted into a 3 bit MLX format. The model averages 40tok/s at 3GB of RAM usage on a MacBook Pro (base M1, 8GB of unified RAM, 256GB of internal storage), making it ideal for lower-end devices.

This model may be unstable and sometimes spit out nonsense, be aware of that!

Downloads last month
222
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for piskle/IBM-Granite-4.0-h-tiny-MLX-3bit

Quantized
(34)
this model