Instructions to use macmacmacmac/VibeThinker-3B-BugBounty-Triage with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use macmacmacmac/VibeThinker-3B-BugBounty-Triage with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("macmacmacmac/VibeThinker-3B-BugBounty-Triage") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use macmacmacmac/VibeThinker-3B-BugBounty-Triage with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "macmacmacmac/VibeThinker-3B-BugBounty-Triage" --prompt "Once upon a time"
VibeThinker-3B — Bug-Bounty Triage (LoRA adapter)
A LoRA fine-tune of WeiboAI/VibeThinker-3B that triages bug-bounty / vulnerability-disclosure submissions into a structured verdict — disposition, severity, confidence, and a rationale — and is hardened against prompt-injection and AI-generated "slop" reports.
Project name: VibeBounty. This repo hosts the trained LoRA adapter (mlx-lm format); fuse it onto the base model to get a standalone model.
What it does
Given a report (title, asset, description, steps, impact), it emits a JSON verdict over a 9-class disposition taxonomy:
valid_impactful · valid_low · corroborated_surge · likely_duplicate · out_of_scope · theoretical_no_poc · self_inflicted · accepted_risk · slop
plus a severity estimate, a confidence gated by claim-reliability, and questions for the researcher.
Files
| file | purpose |
|---|---|
adapters/adapters.safetensors |
final LoRA adapter (iter 2000, mlx-lm) |
adapters/adapter_config.json |
adapter / training config |
lora_config.yaml |
full mlx-lm LoRA recipe |
Usage (Apple Silicon / MLX)
pip install mlx-lm huggingface_hub
hf download macmacmacmac/vibebounty --local-dir vibebounty
# fuse adapter -> standalone model
mlx_lm.fuse --model WeiboAI/VibeThinker-3B \
--adapter-path vibebounty/adapters --save-path vibethinker-bbtriage
# generate
mlx_lm.generate --model vibethinker-bbtriage \
--prompt "Triage this report: IDOR in invoice download ..."
Or load the base + adapter directly with mlx-lm without fusing
(--adapter-path vibebounty/adapters).
Training
- Base: WeiboAI/VibeThinker-3B (Qwen2.5-3B lineage)
- Method: LoRA (rank 16, scale 20, all 36 layers; q/k/v/o + MLP proj),
mask_prompt - Iters: 2000, batch 8, seq 2048, lr 1e-4, AdamW
- Data: ~18k bug-bounty reports labeled from real disclosure outcomes (substate / severity / bounty / CVE), rendered as chat with reasoning targets
- Train loss 3.4 → <0.7; val loss ~1.06
Sample verdicts
// IDOR: GET /api/v2/invoices/{id} returns other tenants' invoices
{"disposition": "valid_impactful", "severity_estimate": "high",
"reasoning": "IDOR / broken-authz against an authenticated API; incrementing id
walks the table -> crosses a real trust boundary with demonstrated impact.",
"confidence": 0.9}
// Log4Shell report with an EXTERNAL CORROBORATION block (CVE-2021-44228, CISA KEV)
{"disposition": "corroborated_surge", "severity_estimate": "critical",
"reasoning": "Maps to a publicly disclosed advisory confirmed by the live feed
(actively exploited) -> corroborated, not spam.",
"used_external_corroboration": true, "confidence": 0.9}
Evaluation (held-out 300 reports, offline)
| metric | heuristic + defense baseline |
|---|---|
| accept / reject accuracy | 97.3% |
| disposition accuracy (9-class) | 56.3% |
| macro-F1 | 0.191 |
| severity within-1 | 71.0% |
| adversarial defense suite | 6 / 6 pass |
Defense layer (model-independent)
Verdicts are guarded by ground-truth checks the model can't talk past:
prompt-injection isolation, claim-level verification (fabricated code symbols → slop),
and threat-intel corroboration (CVE/KEV/OSV → corroborated_surge, never spam).
Offline adversarial suite: 6/6.
Training data & provenance
~18k bug-bounty / vulnerability-disclosure reports compiled from publicly
disclosed sources — primarily disclosed HackerOne reports plus additional
public bug-bounty and Web3 disclosure corpora. Every example's label is
derived from the real adjudicated outcome recorded in the data (HackerOne
substate, severity, bounty amount, vote count, and any associated CVE) and
mapped onto the 9-class disposition taxonomy — the labels are not synthetic.
Each report is rendered as chat (system + user report → assistant reasoning +
verdict JSON); when a CVE is present, a live threat-intel corroboration block is
rendered exactly as the inference pipeline emits it. ~300 reports are held out as
a test split for evaluation.
Academic grounding
The triage flow and its defenses are grounded in recent literature:
- VibeThinker (arXiv:2606.16140) — small-model verifiable reasoning; the base model + the claim-level-reliability idea behind confidence gating.
- From Reviewers' Lens: Bug Bounty Invalid Reasons with LLMs (arXiv:2511.18608) — predicting why a report is invalid; informs the disposition taxonomy + rationale output.
- Triage in SE: A Systematic Review (arXiv:2511.08607) — metadata + retrieval beats text-only → we blend report metadata and threat-intel corroboration.
- CaSey: Streamlining Vulnerability Triage with LLMs (arXiv:2501.18908) — realistic LLM CWE/severity accuracy; keeps expectations honest.
- JudgeDeceiver (arXiv:2403.17710), Adversarial Attacks on LLM-as-a-Judge (arXiv:2504.18333), CUA/JMA (arXiv:2505.13348), RobustJudge (arXiv:2506.09443) — LLM judges (incl. 3B) are injectable → the prompt-injection guard + model-independent ground-truth overrides.
- Stumbling Blocks (arXiv:2402.11638) + paraphrase-attack results (Krishna et al. 2023; Sadasivan et al.) — AI-text detectors collapse under paraphrase → we ground via retrieval / claim verification (fabricated code symbols →
slop), not detection.
Intended use & limitations
Decision-support "sidecar" for analysts, not an autonomous adjudicator. It reflects the biases of the disclosure outcomes it was trained on; always keep a human in the loop for accept/reject and severity. License inherits from the base model — verify before redistribution.
Quantized