Qwen3-32B-Fable-Distill (v0.2)

A Qwen3-32B model fine-tuned via SFT on curated reasoning traces distilled from frontier models.

What is New in v0.2

  • Proper reasoning separation - blocks preserved as distinct reasoning traces (v0.1 had reasoning flattened into generation)
  • Assistant-only loss - training loss computed only on assistant tokens
  • 4,207 training examples - CoT-less examples dropped, Claude channel converted to Qwen3 format
  • 789 training steps, LoRA rank 64, Qwen3-32B 4-bit base

Training Details

Parameter Value
Base model unsloth/qwen3-32b-bnb-4bit
Method SFT via TRL
LoRA rank 64
Training steps 789
Dataset size 4,207 examples
Loss masking Assistant-only
Precision BF16 (merged weights)

Framework Versions

  • PEFT 0.19.1
  • TRL 0.24.0
  • Transformers 5.5.0
  • PyTorch 2.10.0
  • Datasets 4.3.0
  • Tokenizers 0.22.2

Quick Start

LoRA adapter (recommended)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-32b-bnb-4bit")
model = PeftModel.from_pretrained(base, "Ebumping/Qwen3-32B-Fable-Distill")
tokenizer = AutoTokenizer.from_pretrained("Ebumping/Qwen3-32B-Fable-Distill")

Merged BF16 weights

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Ebumping/Qwen3-32B-Fable-Distill")
tokenizer = AutoTokenizer.from_pretrained("Ebumping/Qwen3-32B-Fable-Distill")

GGUF (llama.cpp / Ollama)

llama-server -hf Ebumping/Qwen3-32B-Fable-Distill:Q4_K_M
ollama run hf.co/Ebumping/Qwen3-32B-Fable-Distill:Q4_K_M

vLLM

vllm serve "Ebumping/Qwen3-32B-Fable-Distill"

VRAM Requirements

Format Size Min VRAM
BF16 merged ~64 GB 80 GB+
Q8_0 GGUF ~33 GB 40 GB+
Q5_K_M GGUF ~23 GB 28 GB+
Q4_K_M GGUF ~20 GB 24 GB
Q3_K_M GGUF ~16 GB 20 GB+

Version History

  • v0.2 (current) - Reasoning properly separated with traces, assistant-only loss, 4,207 examples
  • v0.1 - Reasoning flattened into generation

Citation

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouedec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {url{https://github.com/huggingface/trl}}
}
Downloads last month
275
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support