Born-9B Qwen3.5-9B Preview

Born-9B Preview is a PEFT LoRA adapter for Qwen/Qwen3.5-9B. It is trained for coding-agent behavior: concise planning, concrete code or patches, explicit checks, and a final user-facing result.

This release is the promoted Born-9B v2 adapter. Later recovery, hotfix, and preview-recovery experiments are documented in the project repo, but they are not promoted because they did not beat this adapter on the same local held-out gate.

Release Artifact

Hugging Face model ID: rk500/Born-9B-Qwen3.5-9B-Preview
Adapter type: PEFT LoRA adapter, not a merged full model.
Base model: Qwen/Qwen3.5-9B
Local project artifact: checkpoints/born-9b-v2-generated-expanded-lora
Training route: continuation from checkpoints/born-9b-v1-teichai-quick-lora, not a fresh adapter from base.
Intended use: coding-agent SFT behavior, repository repair planning, data-science debugging, tool-use closure, and explicit verification steps.
Not intended as: an official SWE-bench leaderboard claim, a general chatbot replacement, or a raw hidden chain-of-thought model.

Response Contract

Born-9B Preview was trained toward this visible response shape:

Plan:
- short concrete plan

Patch or Code:
<code, patch, commands, or exact action payload>

Checks:
- exact tests/checks to run

Result:
brief final user-facing result

The training target excludes raw hidden reasoning tags. Rows containing <think>, <thinking>, <reasoning>, or similar hidden-trace markers were filtered or sanitized before inclusion. The desired behavior is compact visible rationale: state invariants and decision rules, then produce the implementation and checks.

What Was Trained

The promoted v2 package is born9b_v2_generated_expanded.

Final mix report:

Total validated SFT rows: 7,097
Train / validation split: 6,672 / 425
Estimated tokens: 9.11M
Validation rejects: 0
Hidden tag rows in final validation: 0
Targeted curriculum rows: 85
Eval-derived DPO pairs available for analysis: 14
Duplicate prompts removed during v2 merge: 1,438
Max sequence length used for training: 4096

The v2 run intentionally mixed benchmark-style coding tasks, repair tasks, agent/tool traces, high-quality reasoning final answers, and a limited Irish-language supplement. The goal was not to maximize generic text volume; it was to push the model toward closure on coding-agent tasks.

Dataset Sources Used

The table below lists the sources recorded in the final v2 mix report. Row counts are the rows that survived into the v2 generated-expanded SFT mix after import, normalization, and deduplication.

Source key	Rows	Purpose
`bigcodebench`	845	BigCodeBench-style code generation and library/tool-use coverage.
`ds1000`	724	Data-science and notebook-style debugging tasks.
`mbppplus`	355	MBPP+ style exact-function programming tasks.
`swebench_verified`	312	SWE-style issue resolution prompts and repair planning.
`born9b_coding_workshop_openrouter`	326	Synthetic coding-workshop rows from OpenRouter teacher lanes.
`born9b_coding_workshop_openrouter_qwen_fallback`	234	OpenRouter fallback synthetic rows for coding closure.
`humanevalplus`	154	HumanEval+ style exact-code tasks.
`claw_eval_general`	140	Agentic task-planning examples inspired by Claw-style evaluation.
`claw_eval_multiturn`	24	Multi-turn agentic planning rows.
`repoexec`	173	Repository-context execution and repair rows.
`born9b_bigcodebench_self_seed`	30	Locally generated BigCodeBench-style self-seed rows.
`self_seed_expanded`	170	Local deterministic coding and repair templates.
`self_seed`	24	Early local proof rows.
`mbpp`	1	Legacy MBPP-style seed row.
`swebench_lite`	2	Early SWE-style seed rows.
`claw_eval`	2	Early Claw-style seed rows.
`born9b_coding_workshop_crof`	179	CrofAI-generated coding workshop rows.
`born9b_v2_crof_kimi_eval_expansion`	41	Kimi K2.6 eval-expansion rows.
`born9b_v2_crof_greg_eval_expansion`	12	Crof `greg` eval-expansion rows.
`born9b_v2_openrouter_eval_expansion`	6	Small OpenRouter eval-expansion rows.
`born9b_visible_thinking_style_openrouter`	405	Visible decision-rule and closure style rows.
`born9b_irish_synthetic_openrouter`	1,339	Irish-language synthetic rows for secondary language coverage.
`teichai_claude45_opus_high_reasoning`	249	Sanitized TeichAI Claude 4.5 Opus high-reasoning final answers.
`claude_opus_reasoning`	204	Sanitized Claude Opus 4.6/4.7 reasoning final answers.
`opus46_reasoning_filtered`	198	Filtered Opus 4.6 reasoning final answers.
`tachibana4_deepseek_v4_pro`	258	DeepSeek V4 Pro agentic/coding seed data.
`hermes_agent_kimi`	159	Hermes/Kimi agent traces converted to visible action-result supervision.
`hermes_agent_filtered`	57	Filtered Hermes agent traces.
`hermes_agent_glm51`	67	Hermes/GLM-5.1 agent traces.
`codex_thinking`	201	CodeX-2M-Thinking coding-reasoning rows after sanitization.
`deepseek_v4_distill`	80	DeepSeek V4 distillation rows after hidden-tag removal.
`agenttrove_code`	1	Open-thoughts AgentTrove code/tool sample.
`qwen_webworld`	30	Qwen WebWorldData small web-agent/world-model sample.
`codex_swebenchpro`	10	Codex SWE-bench Pro trace sample after review and formatting.
`born9b_v2_eval_curriculum`	85	Targeted v2 curriculum rows derived from known local eval failure modes.

External Dataset Names

The imported Hugging Face sources documented for the v1/v2 build include:

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k
nohurry/Opus-4.6-Reasoning-3000x-filtered
sequelbox/Tachibana4-DeepSeek-V4-Pro
lambda/hermes-agent-reasoning-traces
Modotte/CodeX-2M-Thinking
Jackrong/DeepSeek-V4-Distill-8000x
DJLougen/hermes-agent-traces-filtered
open-thoughts/AgentTrove
Qwen/WebWorldData
Inferact/codex_swebenchpro_traces
TeichAI/claude-4.5-opus-high-reasoning-250x

These datasets were not imported as raw hidden chain-of-thought. The importer kept final answers, observable actions, code, patches, checks, and results. Rows were normalized into the Born response contract where possible.

Teacher And Synthetic Generation Sources

Teacher-generated and synthetic rows were produced across several provider lanes. The recorded teacher pool includes:

CrofAI: kimi-k2.6, mimo-v2.5-pro, deepseek-v4-pro, and attempted greg lanes.
OpenRouter: inclusionai/ring-2.6-1t, deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro, openrouter/owl-alpha, qwen/qwen3.6-plus, minimax variants, Arcee Trinity, Gemma, and Qwen thinking variants where available during the run.
Local/self generation: deterministic BigCodeBench-style rows, exact-code closure rows, data-science repair rows, agentic closure rows, and tool-use closure rows.

Provider rows were filtered for the expected section markers, minimum response quality, duplicate prompts, and hidden-tag leakage. Some generated lanes were rejected or stopped when schema compliance was poor.

Task Mix

Final v2 row counts by task kind:

Task kind	Rows
`codegen_tooluse`	890
`codegen_data_science`	725
`codegen_exact`	525
`code_repair`	380
`agent_trace`	284
`agentic_code_reasoning_sft`	258
`irish_language_instruction`	278
`irish_language_dialogue`	223
`agent_plan`	207
`claude_reasoning_sft`	204
`code_reasoning_sft`	201
`opus46_filtered_reasoning_sft`	198
`repoexec_context`	173
`irish_language_grammar`	206
`irish_language_translation`	190
`irish_language_culture`	173
`irish_language_public_service`	172
`data_science_debug`	137
`test_design`	138
`repo_agent_workflow`	120
`library_tooluse`	114
`exact_algorithm`	108
`refactor_minimal_patch`	104
`reasoning_sft`	80
`exact_algorithm_with_proof_sketch`	72
`long_context_refactor`	67
`api_integration_debug`	57
`api_integration_incident`	57
`agentic_tool_plan`	50
`repo_repair_invariants`	49
`bigcodebench_self_seed`	30
`web_agent_world_model`	30
`agent_plan_multiturn`	25
`v2_exact_python_closure`	25
`failing_test_to_patch`	24
`test_selection_and_minimal_fix`	24
`v2_data_science_repair`	20
`v2_agentic_closure`	20
`v2_tooluse_closure`	20
`codex_swe_agent_trace`	10

Minor legacy task kinds with one or two rows are retained in the project reports but are not a meaningful part of the final behavior.

Training Configuration

Born-9B Preview v2 was trained with QLoRA SFT:

Model: Qwen/Qwen3.5-9B
Starting adapter: checkpoints/born-9b-v1-teichai-quick-lora
Output adapter: checkpoints/born-9b-v2-generated-expanded-lora
Quantization: 4-bit NF4
Torch dtype: bfloat16
Max sequence length: 4096
LoRA rank: 32
LoRA alpha: 64
LoRA dropout: 0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs: 0.45
Optimizer steps: 188
Learning rate: 1.2e-5
Scheduler: cosine
Warmup ratio: 0.04
Effective batch size: 16
Weight decay: 0.01
Max grad norm: 0.3
Eval every 75 optimizer steps
Training GPU: NVIDIA A40 on RunPod

Final training telemetry:

Final train loss: 0.5978
Final eval loss: 0.7479
Final mean token accuracy: 0.8087
Final epoch: 0.4508
Completed steps: 188 / 188

Evaluation Snapshot

Fixed Local Born Self Eval

This is a project-local 25-task gate covering exact Python, repair, data science, agentic planning, and tool-use. It is not a public leaderboard.

Model	Weighted score	Passed	Exact Python	Repair	Data Science	Agentic	Tool Use
`Qwen/Qwen3.5-9B` base	0.8511	22 / 25	0.4000	0.9526	0.9796	0.9758	0.9475
Born-9B v1 TeichAI	0.8966	22 / 25	0.8000	0.9811	0.8980	0.9064	0.8976
Born-9B Preview v2	0.9244	23 / 25	0.8000	1.0000	0.9745	0.9700	0.8776

The v2 release is promoted because it beats the base model and all later local recovery attempts on this same weighted gate.

SWE-bench Verified Proxy Sample

This is a 25-task issue-resolution proxy sample derived from SWE-bench Verified. It is not the official SWE-bench Docker harness.

Model	Score	Passed
`Qwen/Qwen3.5-9B` base	0.7561	19 / 25
Born-9B Preview v2 initial run	0.8559	22 / 25
Born-9B Preview v2 fresh A40 rerun	0.9117	23 / 25

HumanEval And MBPP Executable Slice

This is a fresh post-release A40 run over the first 25 local HumanEval rows and first 25 local MBPP rows. It executes Python assertions. It is still a small slice, not the full HumanEval/MBPP benchmark.

Model	HumanEval	MBPP	Combined
`Qwen/Qwen3.5-9B` base	0.84, 21 / 25	0.68, 17 / 25	0.76, 38 / 50
Born-9B Preview v2	0.68, 17 / 25	0.76, 19 / 25	0.72, 36 / 50

This is an honest regression on the combined exact-code slice: Born improves the MBPP sample but loses more on HumanEval. Treat exact-code improvement as future work, not a preview claim.

Later Attempts Not Promoted

Candidate	Result	Decision
Born-9B v2-recovery	0.9119, 24 / 25	Preserved, not promoted because weighted score is below v2.
Born-9B preview recovery	0.8703 partial, 19 / 22	Trained cleanly but could not beat v2; not promoted.
v2.2 / v2.3 hotfixes	Targeted exact-code smoke stayed failed	Rejected.
v4 DPO recovery	0.8291, 18 / 25	Rejected.

Known Limitations

This is a LoRA adapter, so users need the Qwen/Qwen3.5-9B base model at inference time.
Evaluation is local and project-specific unless otherwise stated.
The SWE-bench result is a proxy sample, not the official Docker-based SWE-bench score.
The fresh HumanEval/MBPP executable slice trails base Qwen overall, so Born-9B Preview should not be marketed as a general exact-code benchmark win.
Tool-use score remains below base Qwen on the local suite, even though the total weighted score improves.
One known exact-code weakness in v2 is chunking strings as string slices instead of lists of characters.
Some public benchmark-style rows contributed to training-family coverage, so do not interpret training-adjacent probes as clean leaderboard evidence.
The model is optimized for coding-agent closure, not broad open-domain chat.

Loading

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

base_id = "Qwen/Qwen3.5-9B"
adapter_id = "rk500/Born-9B-Qwen3.5-9B-Preview"

tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

quant = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quant,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Example generation:

messages = [
    {
        "role": "system",
        "content": (
            "You are Born-9B, a coding agent. Answer with Plan, Patch or Code, "
            "Checks, and Result. Be concise and complete."
        ),
    },
    {"role": "user", "content": "Fix this Python function and include tests: ..."},
]

text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=900,
        do_sample=False,
        pad_token_id=tok.eos_token_id,
    )

print(tok.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Suggested Inference Prompt

You are Born-9B, a coding agent.
Answer with:
Plan:
- short concrete plan

Patch or Code:
the code, patch, or exact actions

Checks:
- exact tests/checks

Result:
final user-facing result

Do not expose hidden chain-of-thought. Be concise and complete.

Provenance Files In Project Repo

Key local files used to build and verify this release:

configs/distill-v2-generated-expanded.yaml
configs/lora-v2-eval-expanded.yaml
reports/born9b_v2_generated_expanded_mix_report.json
reports/born9b_v2_generated_expanded_validation.json
reports/born9b_v2_generated_local_born_self_25_report_corrected_2026_05_16.json
reports/qwen35_base_local_born_self_25_report_corrected_2026_05_16.json
reports/born9b_v2_swebench_verified_proxy_25_report.json
reports/qwen35_base_swebench_verified_proxy_25_report.json
docs/born-9b-v2-generation-log-2026-05-15.md
docs/born-9b-v2-runpod-training-status-2026-05-15.md
docs/hf-reasoning-agent-datasets-2026-05-14.md
docs/born-9b-preview-crof-recovery-2026-05-17.md

License

This adapter is released under Apache-2.0. The Qwen/Qwen3.5-9B model page lists Apache-2.0 at release time; users must comply with the base model license and with the licenses of any datasets they separately use for further training.

Citation

If you reference this preview artifact, cite it as:

@misc{born9b_qwen35_preview_2026,
  title        = {Born-9B Qwen3.5-9B Preview},
  author       = {LeemerLabs / Repath Khan},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rk500/Born-9B-Qwen3.5-9B-Preview}},
  note         = {PEFT LoRA adapter for Qwen/Qwen3.5-9B}
}

Downloads last month: 52

Model tree for rk500/Born-9B-Qwen3.5-9B-Preview

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(219)

this model