gerbil-qwen3-coder-30b-bf16
Qwen3-Coder-30B-A3B-Instruct fine-tuned for Gerbil Scheme generation.
Training pipeline and tooling: https://github.com/ober/gerbil-lora
Training pipeline
Three-stage LoRA fine-tune (r=32, α=64, fused-MoE expert targets):
- CPT — Continued pre-training on Gerbil source corpus (lr 2e-5, 2 epochs)
- SFT — Supervised fine-tune on instruction/response pairs (lr 1e-4, 2 epochs)
- DPO — Direct preference optimization on wrong→right pairs (lr 5e-6, 3 epochs)
DPO eval (vs base Qwen3-Coder-30B-A3B-Instruct)
| Metric | Base | Trained | Δ |
|---|---|---|---|
| Holdout task score | 31 | 39 | +8 |
| Anti-idioms hit | 1 | 0 | -1 |
| Code blocks wrapped | 9 | 14 | +5 |
| tok_lean_sum (P(chosen) > P(rejected)) | -4.17 | +4.03 | +8.19 |
| wins chosen / rejected (n=66) | 47 / 19 | 52 / 13 | +5 / -6 |
Weights
BF16 merged weights, ~57 GB across 13 safetensors shards.
- Downloads last month
- 45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for jaimef21/gerbil-qwen3-coder-30b-bf16
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct