YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3.6 Multi-Adapter Fine-Tuning Suite

6 specialized LoRA adapters for Qwen/Qwen3.5-4B (validation) and Qwen/Qwen3.6-27B (production). Validated on 4B first, then scaled to 27B.

Workflow: Validate each adapter on 4B (fits in 24GB VRAM) → if stable, scale to 27B production. Always curate datasets locally before burning GPU hours.


6 Specialized Adapters

# Adapter Domain Method Dataset LoRA LR Seq Len Status
1 code-sft Agentic Coding SFT keypa/qwen36-adapter-code-sft (100K rows) r=256 2e-4 16K ✅ Validating with 4B
2 math-grpo Math & Formal Reasoning GRPO keypa/qwen36-adapter-math-grpo r=32 1e-5 4K ⏳ Blocked (vLLM)
3 vision-sft Vision / UI / Frontend SFT keypa/qwen36-adapter-vision-sft (5K rows) r=16 2e-5 4K ⏳ Pending
4 longcontext-sft Long Context / RAG SFT keypa/qwen36-adapter-longcontext-sft (50 rows, placeholder) r=16 5e-5 32K 🔜 Future
5 mlintern-sft ML Infrastructure Specialist SFT Not yet created — your own harness trajectories r=32 2e-5 8K 📋 Planned
6 autoresearch-sft+dpo Autoresearch (Karpathy-style) SFT+DPO Not yet created — Claude Code trajectories r=32 2e-5 8K 📋 Planned

4B validation note: The validation pipeline uses reduced settings (r=32, 2K seq len) to fit in 24GB VRAM. The LoRA/seq len values above are the production targets for 27B training.

Base models: Qwen/Qwen3.6-27B (54GB bf16) for production, Qwen/Qwen3.5-4B (8GB bf16) for validation
Framework: TRL ≥ 0.15, PEFT, Transformers ≥ 4.57.1, numpy<2
Hardware: A100/H100 80GB (27B) / any 24GB VRAM GPU (4B validation)


Validation Workflow

Phase 1: 4B Validation (fits in 24GB VRAM, ~$0.50 on Vast.ai)

# Test ALL adapters on 4B before touching 27B
python train_validate_4b.py --all --max_samples 500 --epochs 1

# Or test one at a time
python train_validate_4b.py --adapter code --max_samples 500 --epochs 1
python train_validate_4b.py --adapter math   --max_samples 500 --epochs 1
python train_validate_4b.py --adapter vision  --max_samples 500 --epochs 1

Validation passes when:

  • Loss decreases over epochs (no NaN)
  • No OOM errors
  • mean_token_accuracy ≥ 0.70 on code
  • Model checkpoint saves correctly

Validation status (method validation on Qwen3.5-4B, 500 samples, 1 epoch — not the 27B model):

Adapter Loss Start Loss End Token Accuracy Status
Code SFT 1.11 0.80 0.75 ✅ Passed
Math GRPO ⏳ Blocked (vllm_ascend)
Vision SFT ⏳ Pending
Long Context 🔜 Future
ML Intern 📋 Dataset not created
Autoresearch 📋 Dataset not created

Phase 2: 27B Production Training

# Only after 4B validation passes
python train_27b_sft.py --adapter code   --max_samples 5000 --epochs 3
python train_27b_grpo.py --adapter math  --max_samples 5000 --epochs 3
python train_27b_sft.py --adapter vision  --max_samples 5000 --epochs 3

Datasets

Existing (verified on HuggingFace)

Dataset Rows Format Used by
keypa/qwen36-adapter-code-sft 100K Parquet Code SFT
keypa/qwen36-adapter-math-grpo Parquet Math GRPO
keypa/qwen36-adapter-vision-sft 5K Parquet Vision SFT
keypa/qwen36-adapter-longcontext-sft 50 Parquet Long Context (placeholder)

Note: keypa/qwen36-adapter-longcontext-sft is a 50-row placeholder. The full dataset needs to be created from tau/scrolls + allenai/qasper.

Planned (not yet created)

Dataset Source Used by Status
ML Intern harness trajectories Your own infrastructure ML Intern SFT 📋 To be created
Claude Code trajectories Autoresearch project Autoresearch SFT+DPO 📋 To be created

Key Design Decisions

bf16, not FP8 or QLoRA 4-bit

  • FP8: Inference only. Training needs bf16 for gradient stability.
  • QLoRA 4-bit: Qwen3.6 uses hybrid attention layers. 4-bit degrades them disproportionately.
  • Rule: Train bf16, deploy FP8.

Dataset Size Strategy

Based on LIMA: 1K-5K high-quality examples > 100K mediocre ones.

LoRA rank choices

Rank Use Case
r=256 Code SFT — massive knowledge injection (100K rows), "LoRA Without Regret" shows r=256 + target_modules="all-linear" ≈ full fine-tuning at 67% compute
r=32 Math GRPO, ML Intern SFT, Autoresearch SFT+DPO — reasoning style + moderate knowledge
r=16 Vision SFT, Long Context SFT — behavior shaping only

Train independently from base model

Each adapter is fine-tuned from the base Qwen3.5-27B — no catastrophic forgetting chain. Merge adapters post-training if you want combined capability.


Autoresearch Adapter (planned)

Based on Karpathy's Autoresearch project. The loop:

  1. Generate trajectories with Claude Code (or your finetuned model in the loop)
  2. Filter by task success + trajectory quality
  3. SFT on successful trajectories
  4. DPO to further prefer good trajectories

Replace Claude Code with your own finetuned 27B once it's ready.


ML Intern Adapter (planned)

SFT on your own ML infrastructure harness trajectories — real, ground-truth tool usage (file operations, bash commands, Python execution, API calls). This is your proprietary data.


Merging Adapters (Post-Training)

python merge_adapters.py \
    --adapters keypa/qwen35-27b-adapter-math-grpo keypa/qwen35-27b-adapter-code-sft \
    --weights 0.5 0.5 \
    --method ties \
    --density 0.6 \
    --output_dir ./merged_math_code \
    --push_to_hub \
    --hub_model_id keypa/qwen35-27b-adapter-merged-math-code

Inference: Swappable Adapters

from vllm import LLM

llm = LLM(model="Qwen/Qwen3.5-27B", dtype="bfloat16")

adapters = {
    "math": "keypa/qwen36-27b-adapter-math-grpo",
    "code": "keypa/qwen36-27b-adapter-code-sft",
    "vision": "keypa/qwen36-27b-adapter-vision-sft",
}

llm.set_lora("code")
output = llm.generate("Write a quicksort in Rust")

Cost Estimate (Vast.ai)

Phase GPU Time Cost
4B validation (all 4) 24GB VRAM 2-4h total ~$2.00
27B production (3 adapters) A100 80GB 20-30h total ~$80-120

Monitoring with Trackio

trackio show --project qwen35-4b-adapters-val

Notes

  1. No QLoRA: Hybrid attention architecture degrades with 4-bit quantization.
  2. vLLM Ascend: Math GRPO requires vllm_ascend submodule — blocked on Vast.ai until resolved.
  3. Long Context: Deferred — requires H100 or multi-GPU setup, too expensive for single A100 on 27B.
  4. Cyber + Medical: Removed from plan. ML Intern and Autoresearch are more useful for the target use case.

Maintained by: @keypa | License: Apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for keypa/qwen36-27b-adapters-suite