Instructions to use rk500/Born-9B-Qwen3.5-9B-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rk500/Born-9B-Qwen3.5-9B-Preview with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "rk500/Born-9B-Qwen3.5-9B-Preview") - Notebooks
- Google Colab
- Kaggle
- Born-9B Qwen3.5-9B Preview
Born-9B Qwen3.5-9B Preview
Born-9B Preview is a PEFT LoRA adapter for Qwen/Qwen3.5-9B. It is trained for coding-agent behavior: concise planning, concrete code or patches, explicit checks, and a final user-facing result.
This release is the promoted Born-9B v2 adapter. Later recovery, hotfix, and preview-recovery experiments are documented in the project repo, but they are not promoted because they did not beat this adapter on the same local held-out gate.
Release Artifact
- Hugging Face model ID:
rk500/Born-9B-Qwen3.5-9B-Preview - Adapter type: PEFT LoRA adapter, not a merged full model.
- Base model:
Qwen/Qwen3.5-9B - Local project artifact:
checkpoints/born-9b-v2-generated-expanded-lora - Training route: continuation from
checkpoints/born-9b-v1-teichai-quick-lora, not a fresh adapter from base. - Intended use: coding-agent SFT behavior, repository repair planning, data-science debugging, tool-use closure, and explicit verification steps.
- Not intended as: an official SWE-bench leaderboard claim, a general chatbot replacement, or a raw hidden chain-of-thought model.
Response Contract
Born-9B Preview was trained toward this visible response shape:
Plan:
- short concrete plan
Patch or Code:
<code, patch, commands, or exact action payload>
Checks:
- exact tests/checks to run
Result:
brief final user-facing result
The training target excludes raw hidden reasoning tags. Rows containing <think>, <thinking>, <reasoning>, or similar hidden-trace markers were filtered or sanitized before inclusion. The desired behavior is compact visible rationale: state invariants and decision rules, then produce the implementation and checks.
What Was Trained
The promoted v2 package is born9b_v2_generated_expanded.
Final mix report:
- Total validated SFT rows:
7,097 - Train / validation split:
6,672 / 425 - Estimated tokens:
9.11M - Validation rejects:
0 - Hidden tag rows in final validation:
0 - Targeted curriculum rows:
85 - Eval-derived DPO pairs available for analysis:
14 - Duplicate prompts removed during v2 merge:
1,438 - Max sequence length used for training:
4096
The v2 run intentionally mixed benchmark-style coding tasks, repair tasks, agent/tool traces, high-quality reasoning final answers, and a limited Irish-language supplement. The goal was not to maximize generic text volume; it was to push the model toward closure on coding-agent tasks.
Dataset Sources Used
The table below lists the sources recorded in the final v2 mix report. Row counts are the rows that survived into the v2 generated-expanded SFT mix after import, normalization, and deduplication.
| Source key | Rows | Purpose |
|---|---|---|
bigcodebench |
845 | BigCodeBench-style code generation and library/tool-use coverage. |
ds1000 |
724 | Data-science and notebook-style debugging tasks. |
mbppplus |
355 | MBPP+ style exact-function programming tasks. |
swebench_verified |
312 | SWE-style issue resolution prompts and repair planning. |
born9b_coding_workshop_openrouter |
326 | Synthetic coding-workshop rows from OpenRouter teacher lanes. |
born9b_coding_workshop_openrouter_qwen_fallback |
234 | OpenRouter fallback synthetic rows for coding closure. |
humanevalplus |
154 | HumanEval+ style exact-code tasks. |
claw_eval_general |
140 | Agentic task-planning examples inspired by Claw-style evaluation. |
claw_eval_multiturn |
24 | Multi-turn agentic planning rows. |
repoexec |
173 | Repository-context execution and repair rows. |
born9b_bigcodebench_self_seed |
30 | Locally generated BigCodeBench-style self-seed rows. |
self_seed_expanded |
170 | Local deterministic coding and repair templates. |
self_seed |
24 | Early local proof rows. |
mbpp |
1 | Legacy MBPP-style seed row. |
swebench_lite |
2 | Early SWE-style seed rows. |
claw_eval |
2 | Early Claw-style seed rows. |
born9b_coding_workshop_crof |
179 | CrofAI-generated coding workshop rows. |
born9b_v2_crof_kimi_eval_expansion |
41 | Kimi K2.6 eval-expansion rows. |
born9b_v2_crof_greg_eval_expansion |
12 | Crof greg eval-expansion rows. |
born9b_v2_openrouter_eval_expansion |
6 | Small OpenRouter eval-expansion rows. |
born9b_visible_thinking_style_openrouter |
405 | Visible decision-rule and closure style rows. |
born9b_irish_synthetic_openrouter |
1,339 | Irish-language synthetic rows for secondary language coverage. |
teichai_claude45_opus_high_reasoning |
249 | Sanitized TeichAI Claude 4.5 Opus high-reasoning final answers. |
claude_opus_reasoning |
204 | Sanitized Claude Opus 4.6/4.7 reasoning final answers. |
opus46_reasoning_filtered |
198 | Filtered Opus 4.6 reasoning final answers. |
tachibana4_deepseek_v4_pro |
258 | DeepSeek V4 Pro agentic/coding seed data. |
hermes_agent_kimi |
159 | Hermes/Kimi agent traces converted to visible action-result supervision. |
hermes_agent_filtered |
57 | Filtered Hermes agent traces. |
hermes_agent_glm51 |
67 | Hermes/GLM-5.1 agent traces. |
codex_thinking |
201 | CodeX-2M-Thinking coding-reasoning rows after sanitization. |
deepseek_v4_distill |
80 | DeepSeek V4 distillation rows after hidden-tag removal. |
agenttrove_code |
1 | Open-thoughts AgentTrove code/tool sample. |
qwen_webworld |
30 | Qwen WebWorldData small web-agent/world-model sample. |
codex_swebenchpro |
10 | Codex SWE-bench Pro trace sample after review and formatting. |
born9b_v2_eval_curriculum |
85 | Targeted v2 curriculum rows derived from known local eval failure modes. |
External Dataset Names
The imported Hugging Face sources documented for the v1/v2 build include:
angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7knohurry/Opus-4.6-Reasoning-3000x-filteredsequelbox/Tachibana4-DeepSeek-V4-Prolambda/hermes-agent-reasoning-tracesModotte/CodeX-2M-ThinkingJackrong/DeepSeek-V4-Distill-8000xDJLougen/hermes-agent-traces-filteredopen-thoughts/AgentTroveQwen/WebWorldDataInferact/codex_swebenchpro_tracesTeichAI/claude-4.5-opus-high-reasoning-250x
These datasets were not imported as raw hidden chain-of-thought. The importer kept final answers, observable actions, code, patches, checks, and results. Rows were normalized into the Born response contract where possible.
Teacher And Synthetic Generation Sources
Teacher-generated and synthetic rows were produced across several provider lanes. The recorded teacher pool includes:
- CrofAI:
kimi-k2.6,mimo-v2.5-pro,deepseek-v4-pro, and attemptedgreglanes. - OpenRouter:
inclusionai/ring-2.6-1t,deepseek/deepseek-v4-flash,deepseek/deepseek-v4-pro,openrouter/owl-alpha,qwen/qwen3.6-plus,minimaxvariants, Arcee Trinity, Gemma, and Qwen thinking variants where available during the run. - Local/self generation: deterministic BigCodeBench-style rows, exact-code closure rows, data-science repair rows, agentic closure rows, and tool-use closure rows.
Provider rows were filtered for the expected section markers, minimum response quality, duplicate prompts, and hidden-tag leakage. Some generated lanes were rejected or stopped when schema compliance was poor.
Task Mix
Final v2 row counts by task kind:
| Task kind | Rows |
|---|---|
codegen_tooluse |
890 |
codegen_data_science |
725 |
codegen_exact |
525 |
code_repair |
380 |
agent_trace |
284 |
agentic_code_reasoning_sft |
258 |
irish_language_instruction |
278 |
irish_language_dialogue |
223 |
agent_plan |
207 |
claude_reasoning_sft |
204 |
code_reasoning_sft |
201 |
opus46_filtered_reasoning_sft |
198 |
repoexec_context |
173 |
irish_language_grammar |
206 |
irish_language_translation |
190 |
irish_language_culture |
173 |
irish_language_public_service |
172 |
data_science_debug |
137 |
test_design |
138 |
repo_agent_workflow |
120 |
library_tooluse |
114 |
exact_algorithm |
108 |
refactor_minimal_patch |
104 |
reasoning_sft |
80 |
exact_algorithm_with_proof_sketch |
72 |
long_context_refactor |
67 |
api_integration_debug |
57 |
api_integration_incident |
57 |
agentic_tool_plan |
50 |
repo_repair_invariants |
49 |
bigcodebench_self_seed |
30 |
web_agent_world_model |
30 |
agent_plan_multiturn |
25 |
v2_exact_python_closure |
25 |
failing_test_to_patch |
24 |
test_selection_and_minimal_fix |
24 |
v2_data_science_repair |
20 |
v2_agentic_closure |
20 |
v2_tooluse_closure |
20 |
codex_swe_agent_trace |
10 |
Minor legacy task kinds with one or two rows are retained in the project reports but are not a meaningful part of the final behavior.
Training Configuration
Born-9B Preview v2 was trained with QLoRA SFT:
- Model:
Qwen/Qwen3.5-9B - Starting adapter:
checkpoints/born-9b-v1-teichai-quick-lora - Output adapter:
checkpoints/born-9b-v2-generated-expanded-lora - Quantization: 4-bit NF4
- Torch dtype: bfloat16
- Max sequence length: 4096
- LoRA rank: 32
- LoRA alpha: 64
- LoRA dropout: 0.05
- Target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Epochs: 0.45
- Optimizer steps: 188
- Learning rate: 1.2e-5
- Scheduler: cosine
- Warmup ratio: 0.04
- Effective batch size: 16
- Weight decay: 0.01
- Max grad norm: 0.3
- Eval every 75 optimizer steps
- Training GPU: NVIDIA A40 on RunPod
Final training telemetry:
- Final train loss: 0.5978
- Final eval loss: 0.7479
- Final mean token accuracy: 0.8087
- Final epoch: 0.4508
- Completed steps: 188 / 188
Evaluation Snapshot
Fixed Local Born Self Eval
This is a project-local 25-task gate covering exact Python, repair, data science, agentic planning, and tool-use. It is not a public leaderboard.
| Model | Weighted score | Passed | Exact Python | Repair | Data Science | Agentic | Tool Use |
|---|---|---|---|---|---|---|---|
Qwen/Qwen3.5-9B base |
0.8511 | 22 / 25 | 0.4000 | 0.9526 | 0.9796 | 0.9758 | 0.9475 |
| Born-9B v1 TeichAI | 0.8966 | 22 / 25 | 0.8000 | 0.9811 | 0.8980 | 0.9064 | 0.8976 |
| Born-9B Preview v2 | 0.9244 | 23 / 25 | 0.8000 | 1.0000 | 0.9745 | 0.9700 | 0.8776 |
The v2 release is promoted because it beats the base model and all later local recovery attempts on this same weighted gate.
SWE-bench Verified Proxy Sample
This is a 25-task issue-resolution proxy sample derived from SWE-bench Verified. It is not the official SWE-bench Docker harness.
| Model | Score | Passed |
|---|---|---|
Qwen/Qwen3.5-9B base |
0.7561 | 19 / 25 |
| Born-9B Preview v2 initial run | 0.8559 | 22 / 25 |
| Born-9B Preview v2 fresh A40 rerun | 0.9117 | 23 / 25 |
HumanEval And MBPP Executable Slice
This is a fresh post-release A40 run over the first 25 local HumanEval rows and first 25 local MBPP rows. It executes Python assertions. It is still a small slice, not the full HumanEval/MBPP benchmark.
| Model | HumanEval | MBPP | Combined |
|---|---|---|---|
Qwen/Qwen3.5-9B base |
0.84, 21 / 25 | 0.68, 17 / 25 | 0.76, 38 / 50 |
| Born-9B Preview v2 | 0.68, 17 / 25 | 0.76, 19 / 25 | 0.72, 36 / 50 |
This is an honest regression on the combined exact-code slice: Born improves the MBPP sample but loses more on HumanEval. Treat exact-code improvement as future work, not a preview claim.
Later Attempts Not Promoted
| Candidate | Result | Decision |
|---|---|---|
| Born-9B v2-recovery | 0.9119, 24 / 25 | Preserved, not promoted because weighted score is below v2. |
| Born-9B preview recovery | 0.8703 partial, 19 / 22 | Trained cleanly but could not beat v2; not promoted. |
| v2.2 / v2.3 hotfixes | Targeted exact-code smoke stayed failed | Rejected. |
| v4 DPO recovery | 0.8291, 18 / 25 | Rejected. |
Known Limitations
- This is a LoRA adapter, so users need the Qwen/Qwen3.5-9B base model at inference time.
- Evaluation is local and project-specific unless otherwise stated.
- The SWE-bench result is a proxy sample, not the official Docker-based SWE-bench score.
- The fresh HumanEval/MBPP executable slice trails base Qwen overall, so Born-9B Preview should not be marketed as a general exact-code benchmark win.
- Tool-use score remains below base Qwen on the local suite, even though the total weighted score improves.
- One known exact-code weakness in v2 is chunking strings as string slices instead of lists of characters.
- Some public benchmark-style rows contributed to training-family coverage, so do not interpret training-adjacent probes as clean leaderboard evidence.
- The model is optimized for coding-agent closure, not broad open-domain chat.
Loading
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
base_id = "Qwen/Qwen3.5-9B"
adapter_id = "rk500/Born-9B-Qwen3.5-9B-Preview"
tok = AutoTokenizer.from_pretrained(base_id, use_fast=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
quant = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base = AutoModelForCausalLM.from_pretrained(
base_id,
device_map="auto",
torch_dtype=torch.bfloat16,
quantization_config=quant,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
Example generation:
messages = [
{
"role": "system",
"content": (
"You are Born-9B, a coding agent. Answer with Plan, Patch or Code, "
"Checks, and Result. Be concise and complete."
),
},
{"role": "user", "content": "Fix this Python function and include tests: ..."},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=900,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Suggested Inference Prompt
You are Born-9B, a coding agent.
Answer with:
Plan:
- short concrete plan
Patch or Code:
the code, patch, or exact actions
Checks:
- exact tests/checks
Result:
final user-facing result
Do not expose hidden chain-of-thought. Be concise and complete.
Provenance Files In Project Repo
Key local files used to build and verify this release:
configs/distill-v2-generated-expanded.yamlconfigs/lora-v2-eval-expanded.yamlreports/born9b_v2_generated_expanded_mix_report.jsonreports/born9b_v2_generated_expanded_validation.jsonreports/born9b_v2_generated_local_born_self_25_report_corrected_2026_05_16.jsonreports/qwen35_base_local_born_self_25_report_corrected_2026_05_16.jsonreports/born9b_v2_swebench_verified_proxy_25_report.jsonreports/qwen35_base_swebench_verified_proxy_25_report.jsondocs/born-9b-v2-generation-log-2026-05-15.mddocs/born-9b-v2-runpod-training-status-2026-05-15.mddocs/hf-reasoning-agent-datasets-2026-05-14.mddocs/born-9b-preview-crof-recovery-2026-05-17.md
License
This adapter is released under Apache-2.0. The Qwen/Qwen3.5-9B model page lists Apache-2.0 at release time; users must comply with the base model license and with the licenses of any datasets they separately use for further training.
Citation
If you reference this preview artifact, cite it as:
@misc{born9b_qwen35_preview_2026,
title = {Born-9B Qwen3.5-9B Preview},
author = {LeemerLabs / Repath Khan},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rk500/Born-9B-Qwen3.5-9B-Preview}},
note = {PEFT LoRA adapter for Qwen/Qwen3.5-9B}
}
- Downloads last month
- 52