Instructions to use Hanno-Labs/bosun-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Hanno-Labs/bosun-4b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-4B") model = PeftModel.from_pretrained(base_model, "Hanno-Labs/bosun-4b") - Notebooks
- Google Colab
- Kaggle
Bosun-4B (4B)
Launch post: Introducing Bosun β
The judge that keeps an agent's memory β its knowledge graph β clean. As an agent accumulates memory as a graph of facts linked by relationships, Bosun-4B decides, edge by edge, which connections are warranted β supported, non-redundant, still-true β so the graph stays useful instead of growing into noise that drowns the model reading it back. Nothing else scores that "judge" step; Bosun-4B is a small, fast, calibrated model built for it, and you program it with a sentence.
Given two findings and an instruction it emits P = sigmoid(logit_yes - logit_no) β [0,1] β how strongly
the pair satisfies the rule you supplied, with no opinion of its own. "Warranted" isn't one fixed rule
(same-entity, cross-domain bridge, not-a-duplicate, still-supported-by-evidence), so you define it per graph;
Bosun-4B follows the rule, respects negation, and generalizes to rules it never trained on. That same
capability is exactly what RAG filtering, content moderation, and deduplication need too β knowledge-graph
curation is simply where the need bites first and hardest.
LoRA fine-tune of Qwen/Qwen3-Reranker-4B, scored on the native reranker yes/no logits.
Inference contract
Native Qwen3-Reranker template; read the last-token logits:
<Instruct>: <your rule, e.g. "Connected only if the two findings share a specific named entity.">
<Query>: These two findings share the specified relationship.
<Document>: FINDING A:\n<text_a>\n\nFINDING B:\n<text_b>
score = sigmoid(logits[yes_id] - logits[no_id]) at the final position (logits_to_keep=1). The exact
yes_id / no_id / template prefix+suffix and max_len are in serving.json.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
repo = "Hanno-Labs/bosun-4b"
cfg = ... # serving.json from this repo
tok = AutoTokenizer.from_pretrained(repo, subfolder="tokenizer", padding_side="left")
base = AutoModelForCausalLM.from_pretrained(cfg["base_model"], torch_dtype=torch.bfloat16,
attn_implementation="sdpa", trust_remote_code=True)
model = PeftModel.from_pretrained(base, repo).merge_and_unload().eval().cuda()
# build ids = prefix + <Instruct/Query/Document> + suffix, then:
# lg = model(input_ids, attention_mask, logits_to_keep=1).logits[:, -1, :]
# p = torch.sigmoid(lg[:, cfg["yes_id"]] - lg[:, cfg["no_id"]])
Results
Bosun-4B is state-of-the-art on FollowIR (public instruction-following retrieval), averaging +17.9 p-MRR on the full pool β it changes its judgments correctly when the instruction changes, where most retrievers move the wrong way. On a capped pool it matches gemini-3.1-flash-lite head-to-head (12.0 = 12.0) at a fraction of the cost.
WarrantBench (Hanno-Labs/warrantbench): follows arbitrary rules and their negations, and flips correctly on steerability triples. The 4B capacity closes the hardest-slice gap to the frontier LLM that the 0.6B leaves open.
Files
| file | what |
|---|---|
adapter_model.safetensors, adapter_config.json |
the LoRA adapter (load with PEFT over the base) |
serving.json |
inference contract: template + yes_id/no_id + max_len |
tokenizer/ |
Qwen tokenizer (left-padding) |
Links
- Launch post β Introducing Bosun
- WarrantBench β github.com/Hanno-Labs/warrantbench (dataset)
From Hanno Labs.
- Downloads last month
- -
Model tree for Hanno-Labs/bosun-4b
Dataset used to train Hanno-Labs/bosun-4b
Article mentioning Hanno-Labs/bosun-4b
Evaluation results
- Steerability (score flips with the rule) on WarrantBenchself-reported0.885
- p-MRR (full pool, avg of 3 tasks) on FollowIRself-reported17.900
