How to use
git clone https://github.com/ml-explore/mlx-lm (mlx-lm==0.31.3)

mlx_lm.convert --hf-path cyberagent/CAT-Thinking-8B --mlx-path ./CAT-Thinking-8B-MLX-8bit -q --q-bits 8 --trust-remote-code

mlx_lm.generate --model ./CAT-Thinking-8B-MLX-8bit --verbose True --prompt "about you" --max-tokens 1000

==========
Prompt: 10 tokens, 84.526 tokens-per-sec
Generation: 569 tokens, 72.305 tokens-per-sec
Peak memory: 8.853 GB

Downloads last month
46
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/CAT-Thinking-8B-MLX-8bit

Finetuned
Qwen/Qwen3-8B
Quantized
(5)
this model