Qwen3.6 Multi-Adapter Fine-Tuning Suite

6 specialized LoRA adapters for Qwen/Qwen3.5-4B (validation) and Qwen/Qwen3.6-27B (production). Validated on 4B first, then scaled to 27B.

Workflow: Validate each adapter on 4B (fits in 24GB VRAM) → if stable, scale to 27B production. Always curate datasets locally before burning GPU hours.

6 Specialized Adapters

#	Adapter	Domain	Method	Dataset	LoRA	LR	Seq Len	Status
1	`code-sft`	Agentic Coding	SFT	`keypa/qwen36-adapter-code-sft` (100K rows)	r=256	2e-4	16K	✅ Validating with 4B
2	`math-grpo`	Math & Formal Reasoning	GRPO	`keypa/qwen36-adapter-math-grpo`	r=32	1e-5	4K	⏳ Blocked (vLLM)
3	`vision-sft`	Vision / UI / Frontend	SFT	`keypa/qwen36-adapter-vision-sft` (5K rows)	r=16	2e-5	4K	⏳ Pending
4	`longcontext-sft`	Long Context / RAG	SFT	`keypa/qwen36-adapter-longcontext-sft` (50 rows, placeholder)	r=16	5e-5	32K	🔜 Future
5	`mlintern-sft`	ML Infrastructure Specialist	SFT	Not yet created — your own harness trajectories	r=32	2e-5	8K	📋 Planned
6	autoresearch-sft+dpo	Autoresearch (Karpathy-style)	SFT+DPO	Not yet created — Claude Code trajectories	r=32	2e-5	8K	📋 Planned

4B validation note: The validation pipeline uses reduced settings (r=32, 2K seq len) to fit in 24GB VRAM. The LoRA/seq len values above are the production targets for 27B training.

Base models: Qwen/Qwen3.6-27B (~~54GB bf16) for production, Qwen/Qwen3.5-4B (~~8GB bf16) for validation
Framework: TRL ≥ 0.15, PEFT, Transformers ≥ 4.57.1, numpy<2
Hardware: A100/H100 80GB (27B) / any 24GB VRAM GPU (4B validation)

Validation Workflow

Phase 1: 4B Validation (fits in 24GB VRAM, ~$0.50 on Vast.ai)

# Test ALL adapters on 4B before touching 27B
python train_validate_4b.py --all --max_samples 500 --epochs 1

# Or test one at a time
python train_validate_4b.py --adapter code --max_samples 500 --epochs 1
python train_validate_4b.py --adapter math   --max_samples 500 --epochs 1
python train_validate_4b.py --adapter vision  --max_samples 500 --epochs 1

Validation passes when:

Loss decreases over epochs (no NaN)
No OOM errors
mean_token_accuracy ≥ 0.70 on code
Model checkpoint saves correctly

Validation status (method validation on Qwen3.5-4B, 500 samples, 1 epoch — not the 27B model):

Adapter	Loss Start	Loss End	Token Accuracy	Status
Code SFT	1.11	0.80	0.75	✅ Passed
Math GRPO	—	—	—	⏳ Blocked (vllm_ascend)
Vision SFT	—	—	—	⏳ Pending
Long Context	—	—	—	🔜 Future
ML Intern	—	—	—	📋 Dataset not created
Autoresearch	—	—	—	📋 Dataset not created

Phase 2: 27B Production Training

# Only after 4B validation passes
python train_27b_sft.py --adapter code   --max_samples 5000 --epochs 3
python train_27b_grpo.py --adapter math  --max_samples 5000 --epochs 3
python train_27b_sft.py --adapter vision  --max_samples 5000 --epochs 3

Datasets

Existing (verified on HuggingFace)

Dataset	Rows	Format	Used by
`keypa/qwen36-adapter-code-sft`	100K	Parquet	Code SFT
`keypa/qwen36-adapter-math-grpo`	—	Parquet	Math GRPO
`keypa/qwen36-adapter-vision-sft`	5K	Parquet	Vision SFT
`keypa/qwen36-adapter-longcontext-sft`	50	Parquet	Long Context (placeholder)

Note: keypa/qwen36-adapter-longcontext-sft is a 50-row placeholder. The full dataset needs to be created from tau/scrolls + allenai/qasper.

Planned (not yet created)

Dataset	Source	Used by	Status
ML Intern harness trajectories	Your own infrastructure	ML Intern SFT	📋 To be created
Claude Code trajectories	Autoresearch project	Autoresearch SFT+DPO	📋 To be created

Key Design Decisions

bf16, not FP8 or QLoRA 4-bit

FP8: Inference only. Training needs bf16 for gradient stability.
QLoRA 4-bit: Qwen3.6 uses hybrid attention layers. 4-bit degrades them disproportionately.
Rule: Train bf16, deploy FP8.

Dataset Size Strategy

Based on LIMA: 1K-5K high-quality examples > 100K mediocre ones.

LoRA rank choices

Rank	Use Case
r=256	Code SFT — massive knowledge injection (100K rows), "LoRA Without Regret" shows r=256 + `target_modules="all-linear"` ≈ full fine-tuning at 67% compute
r=32	Math GRPO, ML Intern SFT, Autoresearch SFT+DPO — reasoning style + moderate knowledge
r=16	Vision SFT, Long Context SFT — behavior shaping only

Train independently from base model

Each adapter is fine-tuned from the base Qwen3.5-27B — no catastrophic forgetting chain. Merge adapters post-training if you want combined capability.

Autoresearch Adapter (planned)

Based on Karpathy's Autoresearch project. The loop:

Generate trajectories with Claude Code (or your finetuned model in the loop)
Filter by task success + trajectory quality
SFT on successful trajectories
DPO to further prefer good trajectories

Replace Claude Code with your own finetuned 27B once it's ready.

ML Intern Adapter (planned)

SFT on your own ML infrastructure harness trajectories — real, ground-truth tool usage (file operations, bash commands, Python execution, API calls). This is your proprietary data.

Merging Adapters (Post-Training)

python merge_adapters.py \
    --adapters keypa/qwen35-27b-adapter-math-grpo keypa/qwen35-27b-adapter-code-sft \
    --weights 0.5 0.5 \
    --method ties \
    --density 0.6 \
    --output_dir ./merged_math_code \
    --push_to_hub \
    --hub_model_id keypa/qwen35-27b-adapter-merged-math-code

Inference: Swappable Adapters

from vllm import LLM

llm = LLM(model="Qwen/Qwen3.5-27B", dtype="bfloat16")

adapters = {
    "math": "keypa/qwen36-27b-adapter-math-grpo",
    "code": "keypa/qwen36-27b-adapter-code-sft",
    "vision": "keypa/qwen36-27b-adapter-vision-sft",
}

llm.set_lora("code")
output = llm.generate("Write a quicksort in Rust")

Cost Estimate (Vast.ai)

Phase	GPU	Time	Cost
4B validation (all 4)	24GB VRAM	2-4h total	~$2.00
27B production (3 adapters)	A100 80GB	20-30h total	~$80-120

Monitoring with Trackio

trackio show --project qwen35-4b-adapters-val

Notes

No QLoRA: Hybrid attention architecture degrades with 4-bit quantization.
vLLM Ascend: Math GRPO requires vllm_ascend submodule — blocked on Vast.ai until resolved.
Long Context: Deferred — requires H100 or multi-GPU setup, too expensive for single A100 on 27B.
Cyber + Medical: Removed from plan. ML Intern and Autoresearch are more useful for the target use case.

Maintained by: @keypa | License: Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for keypa/qwen36-27b-adapters-suite

LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 27