Instructions to use georvn7/hayabusa-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use georvn7/hayabusa-9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="georvn7/hayabusa-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("georvn7/hayabusa-9b") model = AutoModelForCausalLM.from_pretrained("georvn7/hayabusa-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use georvn7/hayabusa-9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "georvn7/hayabusa-9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "georvn7/hayabusa-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/georvn7/hayabusa-9b
- SGLang
How to use georvn7/hayabusa-9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "georvn7/hayabusa-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "georvn7/hayabusa-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "georvn7/hayabusa-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "georvn7/hayabusa-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use georvn7/hayabusa-9b with Docker Model Runner:
docker model run hf.co/georvn7/hayabusa-9b
Hayabusa 9B
Hayabusa 9B is a full-weight fine-tune of Qwen/Qwen3.5-9B specialized for debugger-style software-agent behavior: reading grounded runtime evidence, selecting the next debugging action, and proposing targeted fixes inside the hen long-horizon C++ agent harness.
This is a merged full checkpoint, not a LoRA, QLoRA, or adapter release. The repository follows Hugging Face model conventions with root-level config.json, tokenizer files, model.safetensors.index.json, and sharded safetensors weights.
Intended Use
Hayabusa is intended to be used as a debugger model inside a structured agent loop, especially one that provides:
- recent test output and failure signatures;
- source snippets for specific functions;
- runtime logs, traces, and LLDB evidence;
- constrained action schemas such as
run_test,function_info,debug_function, andfix_function; - fresh verification after every code fix.
The model is optimized for action-oriented debugging, not broad chat, general coding benchmarks, or free-form assistant behavior. Together, Hen and Hayabusa aim at an autonomous debugging loop: inspect evidence, choose the next action, apply a targeted fix, run the tests again, and continue from the new runtime state. It performs best when the harness keeps the context grounded in concrete evidence rather than asking it to infer everything from a vague bug report.
Recommended Hen Usage
A practical role split is:
hayabusa-9bas the Debugger model for next-step selection and focused fix proposals;- a stronger long-context model as Director for trajectory summarization, very large contexts, and review/escalation;
- automatic
run_testafter fixes so the model sees whether the previous hypothesis improved, regressed, or left the failure unchanged.
For Hen runs trained around 32K context, avoid letting debugger context grow without bound. If the prompt becomes much larger than the training context, route large-context reasoning to the Director model or summarize before continuing.
Example Hen role:
-llmDbg vllm/hayabusa-9b
Training Summary
The checkpoint was produced through continuation training stages, always continuing from the previous full checkpoint rather than restarting from the base model.
High-level lineage:
- Full-weight SFT from
Qwen/Qwen3.5-9Bon early no-assistant-thinking debugger trajectories. - Round-2 continuation SFT on a larger no-assistant-thinking SFT union.
- Round-2 DPO on cleaned debugger preference pairs.
- Rare-actions SFT continuation to improve underrepresented debugger actions.
- Super-debug v3 main SFT continuation.
- Super-debug v3 rare-actions SFT with final-assistant-message-only loss masking.
- Super-debug v3 DPO continuation from the v3 rare-actions checkpoint.
The goal of these stages is not generic code completion. The target behavior is compact, evidence-grounded debugging: identify the active blocker, avoid stale hypotheses, request the right runtime evidence, and make a local fix that moves the test state forward.
Data
Public dataset family:
Important caveat: some public dataset views include assistant thinking. This checkpoint was trained primarily on no-assistant-thinking SFT views and cleaned DPO/rare-action derivatives. Exact later-stage training views may differ from the public top-level files.
Example Training Traces
Hayabusa is trained on Hen-style debugging trajectories, where the model learns to operate inside a closed runtime-feedback loop rather than answer one-shot bug reports. The public datasets expose these traces directly on Hugging Face, so users can inspect the training format and build compatible harnesses.
Example from super-debug-v3:
- System/test analysis:
system_68_72.txt - Evidence acquisition step:
step_68_70.txt - Fix proposal step:
step_68_71.txt
The model is trained to systematically analyze test execution results, acquire missing runtime/source evidence, and suggest a fix only when the available evidence is sufficient. At runtime, Hen may provide a more verbose context with project workflow, source summaries, logs, traces, and progress reports, but the debugging trajectory shape is similar.
Behavior Notes
- Text-only model; no vision support.
- Tuned for structured debugging traces and action JSON, not conversational polish.
- Best outputs are usually concise and evidence-backed.
- Can still loop on stale hypotheses if the harness does not provide progress feedback or if context exceeds the range seen during training.
- Works best when the system validates actions and re-requests invalid actions with concrete feedback.
- When used outside Hen, provide a strict action schema and fresh test evidence after each proposed fix.
Loading With Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "georvn7/hayabusa-9b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "user",
"content": "Given this test failure and trace, select the next debugger action as JSON."
}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output_ids = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=False,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
vLLM Serving Notes
Validated local serving shape on vLLM:
vllm serve georvn7/hayabusa-9b \
--served-model-name hayabusa-9b \
--max-model-len 65536 \
--gpu-memory-utilization 0.70 \
--max-num-seqs 1 \
--max-num-batched-tokens 32768 \
--dtype bfloat16 \
--default-chat-template-kwargs '{"enable_thinking":false}' \
--enforce-eager \
--disable-frontend-multiprocessing \
--language-model-only
Recommended starting sampling for Hen-style debugging:
{
"temperature": 0.05,
"top_p": 0.85,
"top_k": 20,
"repetition_penalty": 1.12,
"max_tokens": 4096
}
Use lower temperature for strict JSON/action stability. If the model becomes too repetitive, prefer harness-level stuck detection and progress feedback before increasing randomness aggressively.
Evaluation Status
This is an experimental research checkpoint. It has been used inside Hen on long-horizon C++ debugging trajectories, including difficult SimpleC compiler tests where many general OSS models struggle. It should not be interpreted as a broadly evaluated frontier coding model.
The most meaningful evaluation setting is not a single-pass benchmark. It is a stateful debugging loop with persisted evidence, action validation, run-test verification, and trajectory progress reports.
Limitations
- Specialized debugger/action model, not a general assistant release.
- Not broadly safety-aligned beyond the upstream base model and task data.
- May over-focus on familiar Hen action patterns outside the Hen harness.
- May repeat information-gathering actions if progress feedback is weak.
- Full reproducibility requires the exact no-thinking SFT views, rare-action shards, and cleaned DPO files used in later stages.
License
This model inherits the upstream Qwen/Qwen3.5-9B Apache-2.0 license. See LICENSE.
- Downloads last month
- 29