How to use
git clone https://github.com/ml-explore/mlx-lm (mlx-lm==0.31.3)

mlx_lm.convert --hf-path cyberagent/CAT-Thinking-8B --mlx-path ./CAT-Thinking-8B-MLX-6bit -q --q-bits 6 --trust-remote-code

mlx_lm.generate --model ./CAT-Thinking-8B-MLX-6bit --verbose True --prompt "about you" --max-tokens 1000

==========
Prompt: 10 tokens, 90.942 tokens-per-sec
Generation: 949 tokens, 84.891 tokens-per-sec
Peak memory: 6.865 GB

Downloads last month
22
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/CAT-Thinking-8B-MLX-6bit

Finetuned
Qwen/Qwen3-8B
Quantized
(5)
this model