supra-50m-instruct-fp16-mlx

MLX quantization of SupraLabs/Supra-50M-Instruct for Apple Silicon.

Variant: BFloat16 (lossless reference)
Disk size: 201 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M4 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

This model FP16 baseline
Decode tok/s (avg, long traces) 1270.13 1270.13
Peak memory (GB) 0.223 0.223
Disk size (MB) 201 201

Quality

Benchmark This model FP16 baseline n
IFEval (instruction following) 15.9% 15.9% 44
Alpaca-cleaned (instruct F1 vs reference) 36.2 36.2 50

Context scaling (decode tok/s)

Context length Decode tok/s
~128 tokens 1294.4
~256 tokens 1274.3
~512 tokens 1277.6
~1024 tokens 1234.2

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/supra-50m-instruct-fp16-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model Variant
sahilchachra/supra-50m-instruct-8bit-mlx Affine int8
sahilchachra/supra-50m-instruct-optiq-5bpw-mlx OptiQ mixed-precision (target 5.0 bpw)

Notes

  • Requires Apple Silicon (M1 or later) with MLX
  • Benchmarks run on Apple M4 Pro, 24 GB unified memory
  • License: see SupraLabs/Supra-50M-Instruct for the original model's license

Original model

See SupraLabs/Supra-50M-Instruct for full model details and intended use.

Downloads last month
138
Safetensors
Model size
51.8M params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/supra-50m-instruct-fp16-mlx

Quantized
(6)
this model

Collection including sahilchachra/supra-50m-instruct-fp16-mlx