metadata
			license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
  - vllm
  - mlx
base_model: openai/gpt-oss-120b
See gpt-oss-120b 6.5bit MLX in action - demonstration video
q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.
| Quantization | Perplexity | 
|---|---|
| q2 | 41.293 | 
| q3 | 1.900 | 
| q4 | 1.168 | 
| q6 | 1.128 | 
| q8 | 1.128 | 
Usage Notes
- Built with a modified version of MLX 0.26
- Memory usage: ~95 GB
- Expect ~60 tokens/s
- For more details see demonstration video or visit OpenAI gpt-oss-20b.
