KodaLite-1.3B โ€” MLX 8-bit

8-bit MLX quantization of YoAbriel/KodaLite-1.3B, optimized for Apple Silicon.

Size: ~1.4 GB | Precision: 8-bit (8.5 bpw)

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tok = load("YoAbriel/KodaLite-1.3B-mlx-8bit")
prompt = tok.apply_chat_template(
    [{"role": "user", "content": "What is the capital of France?"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(generate(model, tok, prompt=prompt, max_tokens=80))

Related

Note: 4-bit quantization of this model produces degraded output because the SFT signal is weak at 1.27B params with only 1.64B training tokens. 8-bit works fine.

License

Apache 2.0

Downloads last month
23
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for YoAbriel/KodaLite-1.3B-mlx-8bit

Quantized
(4)
this model