Born-9B Qwen3.5-9B Preview

Born-9B Preview is a PEFT LoRA adapter for Qwen/Qwen3.5-9B. It is trained for coding-agent behavior: concise planning, concrete code or patches, explicit checks, and a final user-facing result.

This release is the promoted Born-9B v2 adapter. Later recovery, hotfix, and preview-recovery experiments are documented in the project repo, but they are not promoted because they did not beat this adapter on the same local held-out gate.

Release Artifact

  • Hugging Face model ID: rk500/Born-9B-Qwen3.5-9B-Preview
  • Adapter type: PEFT LoRA adapter, not a merged full model.
  • Base model: Qwen/Qwen3.5-9B
  • Local project artifact: checkpoints/born-9b-v2-generated-expanded-lora
  • Training route: continuation from checkpoints/born-9b-v1-teichai-quick-lora, not a fresh adapter from base.
  • Intended use: coding-agent SFT behavior, repository repair planning, data-science debugging, tool-use closure, and explicit verification steps.
  • Not intended as: an official SWE-bench leaderboard claim, a general chatbot replacement, or a raw hidden chain-of-thought model.

Response Contract

Born-9B Preview was trained toward this visible response shape:

Plan:
- short concrete plan

Patch or Code:
<code, patch, commands, or exact action payload>

Checks:
- exact tests/checks to run

Result:
brief final user-facing result

The training target excludes raw hidden reasoning tags. Rows containing <think>, <thinking>, <reasoning>, or similar hidden-trace markers were filtered or sanitized before inclusion. The desired behavior is compact visible rationale: state invariants and decision rules, then produce the implementation and checks.

What Was Trained

The promoted v2 package is born9b_v2_generated_expanded.

Final mix report:

  • Total validated SFT rows: 7,097
  • Train / validation split: 6,672 / 425
  • Estimated tokens: 9.11M
  • Validation rejects: 0
  • Hidden tag rows in final validation: 0
  • Targeted curriculum rows: 85
  • Eval-derived DPO pairs available for analysis: 14
  • Duplicate prompts removed during v2 merge: 1,438
  • Max sequence length used for training: 4096

The v2 run intentionally mixed benchmark-style coding tasks, repair tasks, agent/tool traces, high-quality reasoning final answers, and a limited Irish-language supplement. The goal was not to maximize generic text volume; it was to push the model toward closure on coding-agent tasks.

Dataset Sources Used

The table below lists the sources recorded in the final v2 mix report. Row counts are the rows that survived into the v2 generated-expanded SFT mix after import, normalization, and deduplication.

Source key Rows Purpose
bigcodebench 845 BigCodeBench-style code generation and library/tool-use coverage.
ds1000 724 Data-science and notebook-style debugging tasks.
mbppplus 355 MBPP+ style exact-function programming tasks.
swebench_verified 312 SWE-style issue resolution prompts and repair planning.
born9b_coding_workshop_openrouter 326 Synthetic coding-workshop rows from OpenRouter teacher lanes.
born9b_coding_workshop_openrouter_qwen_fallback 234 OpenRouter fallback synthetic rows for coding closure.
humanevalplus 154 HumanEval+ style exact-code tasks.
claw_eval_general 140 Agentic task-planning examples inspired by Claw-style evaluation.
claw_eval_multiturn 24 Multi-turn agentic planning rows.
repoexec 173 Repository-context execution and repair rows.
born9b_bigcodebench_self_seed 30 Locally generated BigCodeBench-style self-seed rows.
self_seed_expanded 170 Local deterministic coding and repair templates.
self_seed 24 Early local proof rows.
mbpp 1 Legacy MBPP-style seed row.
swebench_lite 2 Early SWE-style seed rows.
claw_eval 2 Early Claw-style seed rows.
born9b_coding_workshop_crof 179 CrofAI-generated coding workshop rows.
born9b_v2_crof_kimi_eval_expansion 41 Kimi K2.6 eval-expansion rows.
born9b_v2_crof_greg_eval_expansion 12 Crof greg eval-expansion rows.
born9b_v2_openrouter_eval_expansion 6 Small OpenRouter eval-expansion rows.
born9b_visible_thinking_style_openrouter 405 Visible decision-rule and closure style rows.
born9b_irish_synthetic_openrouter 1,339 Irish-language synthetic rows for secondary language coverage.
teichai_claude45_opus_high_reasoning 249 Sanitized TeichAI Claude 4.5 Opus high-reasoning final answers.
claude_opus_reasoning 204 Sanitized Claude Opus 4.6/4.7 reasoning final answers.
opus46_reasoning_filtered 198 Filtered Opus 4.6 reasoning final answers.
tachibana4_deepseek_v4_pro 258 DeepSeek V4 Pro agentic/coding seed data.
hermes_agent_kimi 159 Hermes/Kimi agent traces converted to visible action-result supervision.
hermes_agent_filtered 57 Filtered Hermes agent traces.
hermes_agent_glm51 67 Hermes/GLM-5.1 agent traces.
codex_thinking 201 CodeX-2M-Thinking coding-reasoning rows after sanitization.
deepseek_v4_distill 80 DeepSeek V4 distillation rows after hidden-tag removal.
agenttrove_code 1 Open-thoughts AgentTrove code/tool sample.
qwen_webworld 30 Qwen WebWorldData small web-agent/world-model sample.
codex_swebenchpro 10 Codex SWE-bench Pro trace sample after review and formatting.
born9b_v2_eval_curriculum 85 Targeted v2 curriculum rows derived from known local eval failure modes.

External Dataset Names

The imported Hugging Face sources documented for the v1/v2 build include:

  • angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
  • nohurry/Opus-4.6-Reasoning-3000x-filtered
  • sequelbox/Tachibana4-DeepSeek-V4-Pro
  • lambda/hermes-agent-reasoning-traces
  • Modotte/CodeX-2M-Thinking
  • Jackrong/DeepSeek-V4-Distill-8000x
  • DJLougen/hermes-agent-traces-filtered
  • open-thoughts/AgentTrove
  • Qwen/WebWorldData
  • Inferact/codex_swebenchpro_traces
  • TeichAI/claude-4.5-opus-high-reasoning-250x

These datasets were not imported as raw hidden chain-of-thought. The importer kept final answers, observable actions, code, patches, checks, and results. Rows were normalized into the Born response contract where possible.

Teacher And Synthetic Generation Sources

Teacher-generated and synthetic rows were produced across several provider lanes. The recorded teacher pool includes:

  • CrofAI: kimi-k2.6, mimo-v2.5-pro, deepseek-v4-pro, and attempted greg lanes.
  • OpenRouter: inclusionai/ring-2.6-1t, deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro, openrouter/owl-alpha, qwen/qwen3.6-plus, minimax variants, Arcee Trinity, Gemma, and Qwen thinking variants where available during the run.
  • Local/self generation: deterministic BigCodeBench-style rows, exact-code closure rows, data-science repair rows, agentic closure rows, and tool-use closure rows.

Provider rows were filtered for the expected section markers, minimum response quality, duplicate prompts, and hidden-tag leakage. Some generated lanes were rejected or stopped when schema compliance was poor.

Task Mix

Final v2 row counts by task kind:

Task kind Rows
codegen_tooluse 890
codegen_data_science 725
codegen_exact 525
code_repair 380
agent_trace 284
agentic_code_reasoning_sft 258
irish_language_instruction 278
irish_language_dialogue 223
agent_plan 207
claude_reasoning_sft 204
code_reasoning_sft 201
opus46_filtered_reasoning_sft 198
repoexec_context 173
irish_language_grammar 206
irish_language_translation 190
irish_language_culture 173
irish_language_public_service 172
data_science_debug 137
test_design 138
repo_agent_workflow 120
library_tooluse 114
exact_algorithm 108
refactor_minimal_patch 104
reasoning_sft 80
exact_algorithm_with_proof_sketch 72
long_context_refactor 67
api_integration_debug 57
api_integration_incident 57
agentic_tool_plan 50
repo_repair_invariants 49
bigcodebench_self_seed 30
web_agent_world_model 30
agent_plan_multiturn 25
v2_exact_python_closure 25
failing_test_to_patch 24
test_selection_and_minimal_fix 24
v2_data_science_repair 20
v2_agentic_closure 20
v2_tooluse_closure 20
codex_swe_agent_trace 10

Minor legacy task kinds with one or two rows are retained in the project reports but are not a meaningful part of the final behavior.

Training Configuration

Born-9B Preview v2 was trained with QLoRA SFT:

  • Model: Qwen/Qwen3.5-9B
  • Starting adapter: checkpoints/born-9b-v1-teichai-quick-lora
  • Output adapter: checkpoints/born-9b-v2-generated-expanded-lora
  • Quantization: 4-bit NF4
  • Torch dtype: bfloat16
  • Max sequence length: 4096
  • LoRA rank: 32
  • LoRA alpha: 64
  • LoRA dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Epochs: 0.45
  • Optimizer steps: 188
  • Learning rate: 1.2e-5
  • Scheduler: cosine
  • Warmup ratio: 0.04
  • Effective batch size: 16
  • Weight decay: 0.01
  • Max grad norm: 0.3
  • Eval every 75 optimizer steps
  • Training GPU: NVIDIA A40 on RunPod

Final training telemetry:

  • Final train loss: 0.5978
  • Final eval loss: 0.7479
  • Final mean token accuracy: 0.8087
  • Final epoch: 0.4508
  • Completed steps: 188 / 188

Evaluation Snapshot

Fixed Local Born Self Eval

This is a project-local 25-task gate covering exact Python, repair, data science, agentic planning, and tool-use. It is not a public leaderboard.

Model Weighted score Passed Exact Python Repair Data Science Agentic Tool Use
Qwen/Qwen3.5-9B base 0.8511 22 / 25 0.4000 0.9526 0.9796 0.9758 0.9475
Born-9B v1 TeichAI 0.8966 22 / 25 0.8000 0.9811 0.8980 0.9064 0.8976
Born-9B Preview v2 0.9244 23 / 25 0.8000 1.0000 0.9745 0.9700 0.8776

The v2 release is promoted because it beats the base model and all later local recovery attempts on this same weighted gate.

SWE-bench Verified Proxy Sample

This is a 25-task issue-resolution proxy sample derived from SWE-bench Verified. It is not the official SWE-bench Docker harness.

Model Score Passed
Qwen/Qwen3.5-9B base 0.7561 19 / 25
Born-9B Preview v2 initial run 0.8559 22 / 25
Born-9B Preview v2 fresh A40 rerun 0.9117 23 / 25

HumanEval And MBPP Executable Slice

This is a fresh post-release A40 run over the first 25 local HumanEval rows and first 25 local MBPP rows. It executes Python assertions. It is still a small slice, not the full HumanEval/MBPP benchmark.

Model HumanEval MBPP Combined
Qwen/Qwen3.5-9B base 0.84, 21 / 25 0.68, 17 / 25 0.76, 38 / 50
Born-9B Preview v2 0.68, 17 / 25 0.76, 19 / 25 0.72, 36 / 50

This is an honest regression on the combined exact-code slice: Born improves the MBPP sample but loses more on HumanEval. Treat exact-code improvement as future work, not a preview claim.

Later Attempts Not Promoted

Candidate Result Decision
Born-9B v2-recovery 0.9119, 24 / 25 Preserved, not promoted because weighted score is below v2.
Born-9B preview recovery 0.8703 partial, 19 / 22 Trained cleanly but could not beat v2; not promoted.
v2.2 / v2.3 hotfixes Targeted exact-code smoke stayed failed Rejected.
v4 DPO recovery 0.8291, 18 / 25 Rejected.

Known Limitations

  • This is a LoRA adapter, so users need the Qwen/Qwen3.5-9B base model at inference time.
  • Evaluation is local and project-specific unless otherwise stated.
  • The SWE-bench result is a proxy sample, not the official Docker-based SWE-bench score.
  • The fresh HumanEval/MBPP executable slice trails base Qwen overall, so Born-9B Preview should not be marketed as a general exact-code benchmark win.
  • Tool-use score remains below base Qwen on the local suite, even though the total weighted score improves.
  • One known exact-code weakness in v2 is chunking strings as string slices instead of lists of characters.
  • Some public benchmark-style rows contributed to training-family coverage, so do not interpret training-adjacent probes as clean leaderboard evidence.
  • The model is optimized for coding-agent closure, not broad open-domain chat.

Loading

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

base_id = "Qwen/Qwen3.5-9B"
adapter_id = "rk500/Born-9B-Qwen3.5-9B-Preview"

tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

quant = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quant,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Example generation:

messages = [
    {
        "role": "system",
        "content": (
            "You are Born-9B, a coding agent. Answer with Plan, Patch or Code, "
            "Checks, and Result. Be concise and complete."
        ),
    },
    {"role": "user", "content": "Fix this Python function and include tests: ..."},
]

text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=900,
        do_sample=False,
        pad_token_id=tok.eos_token_id,
    )

print(tok.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Suggested Inference Prompt

You are Born-9B, a coding agent.
Answer with:
Plan:
- short concrete plan

Patch or Code:
the code, patch, or exact actions

Checks:
- exact tests/checks

Result:
final user-facing result

Do not expose hidden chain-of-thought. Be concise and complete.

Provenance Files In Project Repo

Key local files used to build and verify this release:

  • configs/distill-v2-generated-expanded.yaml
  • configs/lora-v2-eval-expanded.yaml
  • reports/born9b_v2_generated_expanded_mix_report.json
  • reports/born9b_v2_generated_expanded_validation.json
  • reports/born9b_v2_generated_local_born_self_25_report_corrected_2026_05_16.json
  • reports/qwen35_base_local_born_self_25_report_corrected_2026_05_16.json
  • reports/born9b_v2_swebench_verified_proxy_25_report.json
  • reports/qwen35_base_swebench_verified_proxy_25_report.json
  • docs/born-9b-v2-generation-log-2026-05-15.md
  • docs/born-9b-v2-runpod-training-status-2026-05-15.md
  • docs/hf-reasoning-agent-datasets-2026-05-14.md
  • docs/born-9b-preview-crof-recovery-2026-05-17.md

License

This adapter is released under Apache-2.0. The Qwen/Qwen3.5-9B model page lists Apache-2.0 at release time; users must comply with the base model license and with the licenses of any datasets they separately use for further training.

Citation

If you reference this preview artifact, cite it as:

@misc{born9b_qwen35_preview_2026,
  title        = {Born-9B Qwen3.5-9B Preview},
  author       = {LeemerLabs / Repath Khan},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rk500/Born-9B-Qwen3.5-9B-Preview}},
  note         = {PEFT LoRA adapter for Qwen/Qwen3.5-9B}
}
Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rk500/Born-9B-Qwen3.5-9B-Preview

Finetuned
Qwen/Qwen3.5-9B
Adapter
(219)
this model