YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3.6 Multi-Adapter Fine-Tuning Suite
6 specialized LoRA adapters for Qwen/Qwen3.5-4B (validation) and Qwen/Qwen3.6-27B (production). Validated on 4B first, then scaled to 27B.
Workflow: Validate each adapter on 4B (fits in 24GB VRAM) → if stable, scale to 27B production. Always curate datasets locally before burning GPU hours.
6 Specialized Adapters
| # | Adapter | Domain | Method | Dataset | LoRA | LR | Seq Len | Status |
|---|---|---|---|---|---|---|---|---|
| 1 | code-sft |
Agentic Coding | SFT | keypa/qwen36-adapter-code-sft (100K rows) |
r=256 | 2e-4 | 16K | ✅ Validating with 4B |
| 2 | math-grpo |
Math & Formal Reasoning | GRPO | keypa/qwen36-adapter-math-grpo |
r=32 | 1e-5 | 4K | ⏳ Blocked (vLLM) |
| 3 | vision-sft |
Vision / UI / Frontend | SFT | keypa/qwen36-adapter-vision-sft (5K rows) |
r=16 | 2e-5 | 4K | ⏳ Pending |
| 4 | longcontext-sft |
Long Context / RAG | SFT | keypa/qwen36-adapter-longcontext-sft (50 rows, placeholder) |
r=16 | 5e-5 | 32K | 🔜 Future |
| 5 | mlintern-sft |
ML Infrastructure Specialist | SFT | Not yet created — your own harness trajectories | r=32 | 2e-5 | 8K | 📋 Planned |
| 6 | autoresearch-sft+dpo | Autoresearch (Karpathy-style) | SFT+DPO | Not yet created — Claude Code trajectories | r=32 | 2e-5 | 8K | 📋 Planned |
4B validation note: The validation pipeline uses reduced settings (r=32, 2K seq len) to fit in 24GB VRAM. The LoRA/seq len values above are the production targets for 27B training.
Base models: Qwen/Qwen3.6-27B (54GB bf16) for production, 8GB bf16) for validationQwen/Qwen3.5-4B (
Framework: TRL ≥ 0.15, PEFT, Transformers ≥ 4.57.1, numpy<2
Hardware: A100/H100 80GB (27B) / any 24GB VRAM GPU (4B validation)
Validation Workflow
Phase 1: 4B Validation (fits in 24GB VRAM, ~$0.50 on Vast.ai)
# Test ALL adapters on 4B before touching 27B
python train_validate_4b.py --all --max_samples 500 --epochs 1
# Or test one at a time
python train_validate_4b.py --adapter code --max_samples 500 --epochs 1
python train_validate_4b.py --adapter math --max_samples 500 --epochs 1
python train_validate_4b.py --adapter vision --max_samples 500 --epochs 1
Validation passes when:
- Loss decreases over epochs (no NaN)
- No OOM errors
mean_token_accuracy≥ 0.70 on code- Model checkpoint saves correctly
Validation status (method validation on Qwen3.5-4B, 500 samples, 1 epoch — not the 27B model):
| Adapter | Loss Start | Loss End | Token Accuracy | Status |
|---|---|---|---|---|
| Code SFT | 1.11 | 0.80 | 0.75 | ✅ Passed |
| Math GRPO | — | — | — | ⏳ Blocked (vllm_ascend) |
| Vision SFT | — | — | — | ⏳ Pending |
| Long Context | — | — | — | 🔜 Future |
| ML Intern | — | — | — | 📋 Dataset not created |
| Autoresearch | — | — | — | 📋 Dataset not created |
Phase 2: 27B Production Training
# Only after 4B validation passes
python train_27b_sft.py --adapter code --max_samples 5000 --epochs 3
python train_27b_grpo.py --adapter math --max_samples 5000 --epochs 3
python train_27b_sft.py --adapter vision --max_samples 5000 --epochs 3
Datasets
Existing (verified on HuggingFace)
| Dataset | Rows | Format | Used by |
|---|---|---|---|
keypa/qwen36-adapter-code-sft |
100K | Parquet | Code SFT |
keypa/qwen36-adapter-math-grpo |
— | Parquet | Math GRPO |
keypa/qwen36-adapter-vision-sft |
5K | Parquet | Vision SFT |
keypa/qwen36-adapter-longcontext-sft |
50 | Parquet | Long Context (placeholder) |
Note:
keypa/qwen36-adapter-longcontext-sftis a 50-row placeholder. The full dataset needs to be created fromtau/scrolls+allenai/qasper.
Planned (not yet created)
| Dataset | Source | Used by | Status |
|---|---|---|---|
| ML Intern harness trajectories | Your own infrastructure | ML Intern SFT | 📋 To be created |
| Claude Code trajectories | Autoresearch project | Autoresearch SFT+DPO | 📋 To be created |
Key Design Decisions
bf16, not FP8 or QLoRA 4-bit
- FP8: Inference only. Training needs bf16 for gradient stability.
- QLoRA 4-bit: Qwen3.6 uses hybrid attention layers. 4-bit degrades them disproportionately.
- Rule: Train bf16, deploy FP8.
Dataset Size Strategy
Based on LIMA: 1K-5K high-quality examples > 100K mediocre ones.
LoRA rank choices
| Rank | Use Case |
|---|---|
| r=256 | Code SFT — massive knowledge injection (100K rows), "LoRA Without Regret" shows r=256 + target_modules="all-linear" ≈ full fine-tuning at 67% compute |
| r=32 | Math GRPO, ML Intern SFT, Autoresearch SFT+DPO — reasoning style + moderate knowledge |
| r=16 | Vision SFT, Long Context SFT — behavior shaping only |
Train independently from base model
Each adapter is fine-tuned from the base Qwen3.5-27B — no catastrophic forgetting chain. Merge adapters post-training if you want combined capability.
Autoresearch Adapter (planned)
Based on Karpathy's Autoresearch project. The loop:
- Generate trajectories with Claude Code (or your finetuned model in the loop)
- Filter by task success + trajectory quality
- SFT on successful trajectories
- DPO to further prefer good trajectories
Replace Claude Code with your own finetuned 27B once it's ready.
ML Intern Adapter (planned)
SFT on your own ML infrastructure harness trajectories — real, ground-truth tool usage (file operations, bash commands, Python execution, API calls). This is your proprietary data.
Merging Adapters (Post-Training)
python merge_adapters.py \
--adapters keypa/qwen35-27b-adapter-math-grpo keypa/qwen35-27b-adapter-code-sft \
--weights 0.5 0.5 \
--method ties \
--density 0.6 \
--output_dir ./merged_math_code \
--push_to_hub \
--hub_model_id keypa/qwen35-27b-adapter-merged-math-code
Inference: Swappable Adapters
from vllm import LLM
llm = LLM(model="Qwen/Qwen3.5-27B", dtype="bfloat16")
adapters = {
"math": "keypa/qwen36-27b-adapter-math-grpo",
"code": "keypa/qwen36-27b-adapter-code-sft",
"vision": "keypa/qwen36-27b-adapter-vision-sft",
}
llm.set_lora("code")
output = llm.generate("Write a quicksort in Rust")
Cost Estimate (Vast.ai)
| Phase | GPU | Time | Cost |
|---|---|---|---|
| 4B validation (all 4) | 24GB VRAM | 2-4h total | ~$2.00 |
| 27B production (3 adapters) | A100 80GB | 20-30h total | ~$80-120 |
Monitoring with Trackio
trackio show --project qwen35-4b-adapters-val
Notes
- No QLoRA: Hybrid attention architecture degrades with 4-bit quantization.
- vLLM Ascend: Math GRPO requires
vllm_ascendsubmodule — blocked on Vast.ai until resolved. - Long Context: Deferred — requires H100 or multi-GPU setup, too expensive for single A100 on 27B.
- Cyber + Medical: Removed from plan. ML Intern and Autoresearch are more useful for the target use case.
Maintained by: @keypa | License: Apache-2.0