gerbil-qwen3-coder-30b-bf16

Qwen3-Coder-30B-A3B-Instruct fine-tuned for Gerbil Scheme generation.

Training pipeline and tooling: https://github.com/ober/gerbil-lora

Training pipeline

Three-stage LoRA fine-tune (r=32, α=64, fused-MoE expert targets):

  1. CPT — Continued pre-training on Gerbil source corpus (lr 2e-5, 2 epochs)
  2. SFT — Supervised fine-tune on instruction/response pairs (lr 1e-4, 2 epochs)
  3. DPO — Direct preference optimization on wrong→right pairs (lr 5e-6, 3 epochs)

DPO eval (vs base Qwen3-Coder-30B-A3B-Instruct)

Metric Base Trained Δ
Holdout task score 31 39 +8
Anti-idioms hit 1 0 -1
Code blocks wrapped 9 14 +5
tok_lean_sum (P(chosen) > P(rejected)) -4.17 +4.03 +8.19
wins chosen / rejected (n=66) 47 / 19 52 / 13 +5 / -6

Weights

BF16 merged weights, ~57 GB across 13 safetensors shards.

Downloads last month
45
Safetensors
Model size
31B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jaimef21/gerbil-qwen3-coder-30b-bf16

Adapter
(44)
this model