Qwen3-Swallow-32B-RL-v0.2-MLX-8bit

This model is an MLX format conversion of tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2, optimized for Apple Silicon.

Model Details

Attribute Value
Original Model tokyotech-llm/Qwen3-Swallow-32B-RL-v0.2
Architecture Dense Transformer
Parameters 32B
Quantization 8-bit quantization
Model Size ~32 GB
Format MLX (Apple Silicon optimized)
Converted with mlx-lm v0.30.8
License Apache 2.0

About Qwen3-Swallow

Qwen3-Swallow is a bilingual Japanese-English large language model developed by the Swallow Project at the Institute of Science Tokyo (formerly Tokyo Institute of Technology) and AIST. Built upon Qwen3 through Continual Pre-Training (CPT), Supervised Fine-Tuning (SFT), and Reinforcement Learning (RL), it achieves strong performance on both Japanese and English tasks while maintaining capabilities in mathematics and coding.

For more details, see the original model card.

Usage

Quick Start (Python)

from mlx_lm import load, generate

model, tokenizer = load("tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-8bit")

messages = [{"role": "user", "content": "hello"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)

Interactive Chat

mlx_lm.chat --model tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-8bit

OpenAI-Compatible Server

mlx_lm.server --model tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-8bit --port 8080

Then connect with any OpenAI-compatible client at http://localhost:8080/v1.

Acknowledgments

Downloads last month
34
Safetensors
Model size
33B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tocchitocchi/Qwen3-Swallow-32B-RL-v0.2-MLX-8bit