Instructions to use AMFORGE/samg-cobratooling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg-cobratooling with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg-cobratooling")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg-cobratooling", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg-cobratooling with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg-cobratooling" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg-cobratooling
- SGLang
How to use AMFORGE/samg-cobratooling with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-cobratooling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-cobratooling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg-cobratooling with Docker Model Runner:
docker model run hf.co/AMFORGE/samg-cobratooling
SAM-G-CobraTooling
SAM-G-CobraTooling is a 30.3M-parameter model fine-tuned from
SAM-G-Reasoning on 196k
agentic orchestration traces. It turns a natural-language instruction — or an
observation from a previous step — into an ordered, risk-flagged JSON plan of
tool calls. It is the local orchestration layer of an agentic IDE: it routes,
decomposes, tracks state, reacts to exit codes and HTTP status, and emits
structured tool calls entirely offline. It does not write code; code is
delegated to a larger model via an ask_code_model hand-off. Built by
AMEFORGE for the CobraBub IDE.
- Parameters: 30.3M · Footprint: 121 MB fp32 (~30 MB quantized) · Base: SAM-G-Reasoning
- Fine-tuning: prompt-masked SFT (loss on the plan span only), cosine 8e-5, 10k steps, best at 6k
- Aggregate exact plan-match: 78.8% (held-out, disjoint seed)
- Lineage: SAM-G → SAM-G-Reasoning → SAM-G-CobraTooling
Output format
<instruction> [ACTION] {"plan":[{"op":...,"args":{...},"risk":"safe|critical"}, ...]}
<intent> | {"last_op":...,"...":...} [ACTION] {"plan":[ ... ]} # reactive (observation-driven)
Every step carries a risk flag (safe or critical) that drives the IDE
confirmation gate: safe ops run autonomously, critical ops require explicit
user confirmation.
What it is good at — and what it is not
Stress-tested on thirteen families. The pattern mirrors the rest of the SAM-G line: it excels at routing and reaction (short, procedural) and is limited on long ordered chains that must match exactly at 30M parameters.
| Family | Exact % | Type |
|---|---|---|
| single_tool (routing) | 100 | routing |
| retry_loop (exit-code state machine) | 100 | reaction |
| feedback_react (stdout/stderr) | 100 | reaction |
| git_workflow (status→add→push, gated) | 100 | procedural |
| scrape_research (fetch→summarize→act) | 100 | procedural |
| db_query (SQL, SELECT vs mutation) | 100 | structured call |
| webhook_wait (async callback) | 92 | async reaction |
| mcp_call (filesystem/github/postgres) | 83 | structured call |
| api_call (REST/GraphQL + HTTP state machine) | 75 | structured call |
| plan_chain (multi-step plans) | 58 | planning |
| risk_gate (mixed safe/critical plans) | 58 | gated planning |
| fs_watch (file-change reaction) | 42 | async reaction |
| build_test_cycle (edit→test→react + hand-off) | 17 | long chain |
Routing, exit-code reaction, git, scraping and SQL routing are saturated.
mcp_call at 83% makes the model a viable local driver for MCP servers — the
core capability of a hosted code agent, here running offline. plan_chain rose
from the v1 plateau (0–42%) to 58% after broadening generator coverage.
build_test_cycle remains the hard family: four-to-five ordered ops ending in a
code-model hand-off, scored by strict exact match — the same long-chain ceiling
seen with arithmetic in SAM-G-Reasoning. For those, decompose app-side into
shorter sub-calls.
Security: the risk flag is advisory, not a boundary
The model flags critical ops with 94% fidelity across all families — strong
for pre-flagging and good UX. It must not be the sole security boundary. A
30M model will mis-flag a fraction of decisions, and the failure modes are
asymmetric: a false negative (a critical op flagged safe) would auto-run a
destructive command without confirmation. Integrators must add a
deterministic backstop: a hard whitelist/blacklist in the app that forces
critical on known-dangerous operations (rm -rf, git push, DROP/DELETE,
external mutating HTTP, MCP write tools, delete_file) regardless of the
model's flag. Treat the model's risk field as a fast hint that pre-fills the
confirmation gate, with the app's deterministic rules as the enforced boundary.
Op vocabulary
Routing/IO: open_file, list_dir, run_command, scrape, summarize,
capture, open_app. Hand-off: ask_code_model, write_file. Control:
retry, escalate, backoff, reauth, continue, stop. Integrations:
api_call, mcp_call, db_query, webhook_wait, fs_watch, git_push.
Intended use
The local planning/routing/reaction layer of an agentic IDE: decompose an
instruction into ordered tool calls, react to observations (exit codes, stderr,
HTTP status, DB row counts, webhook payloads, file-change events), and emit
structured, risk-flagged plans offline and for free. Roughly the procedural
majority of agentic turns; hard code generation and long exact chains are
escalated to a larger model via ask_code_model.
Usage
import sentencepiece as spm, torch
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
# routing
prompt = "open src/main.js and run the tests [ACTION]"
# -> {"plan":[{"op":"open_file","args":{"path":"src/main.js"},"risk":"safe"},
# {"op":"run_command","args":{"cmd":"pytest"},"risk":"safe"}]}
# reactive: HTTP 429 -> back off and retry
prompt = "rate limited, back off and retry | {\"last_op\":\"api_call\",\"status\":429} [ACTION]"
# -> {"plan":[{"op":"backoff","args":{"seconds":30},"risk":"safe"},
# {"op":"retry","args":{"attempt":2},"risk":"safe"}]}
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode the [ACTION] span -> structured plan JSON
Limitations
build_test_cycle(17%) and the exact-match ofplan_chain/risk_gate(58%) plateau because long, strictly-ordered plans are hard at 30M; decompose long plans app-side into shorter sub-calls.- The
riskflag is advisory (94% fidelity); enforce a deterministic backstop in the app, as above. - Traces are synthetic, drawn from the training family distribution with a disjoint evaluation seed; coverage reflects the generator, not arbitrary real-world tool APIs.
- Not a general assistant and does not write code; it orchestrates and hands off. Inherits the base model's knowledge limits.
Citation
@misc{samgcobratooling2026,
title = {SAM-G-CobraTooling: Risk-Flagged Agentic Tool-Call Orchestration at 30M Parameters},
author = {AMEFORGE Lab},
year = {2026}
}
Evaluation results
- Exact plan match, aggregate (%)self-reported78.800
- Risk-gate fidelity (%)self-reported94.000