SAM-G-Agent

SAM-G-Agent is the autonomous-agent member of the SAM-G family: a ~30M-parameter, offline, dual-mode model that acts as the per-step tool dispatcher of a long-running agentic loop (Manus / Claude-Code style). Given an instruction or the current state of a task, it emits the next action(s) as a compact, risk-flagged JSON plan that an executor runs against real tools.

It is not a monolithic long-horizon planner. An agent built on SAM-G-Agent runs for hours by a host loop that re-invokes the model each turn with the latest observation; the model returns one short action at a time. This design plays to the model's strength (short, reactive tool emission) and around its limit (long exactly-ordered chains).

What it does

Input: a natural-language instruction (EN/FR), optionally followed by an observation block ( intent | {observation}). Output, after the [ACTION] mode token:

{"plan":[{"op":"web_search","args":{"query":"latest diffusion models"},"risk":"safe"}]}

A terminal {"op":"finish","args":{...}} tells the host loop to stop.

Tool vocabulary

op purpose default risk
web_search query the web safe
scrape_page fetch a page's content safe
read_arxiv read an arXiv paper safe
browse navigate a site (open / click / scroll / extract); submit/download gated safe / critical
execute_python run Python; gated when it touches os/subprocess/files/network safe / critical
generate_image text-to-image safe
ffmpeg video/audio editing (trim, concat, overlay, subtitles, extract audio) safe
download_file fetch a file to disk critical
transfer_token move crypto / value always critical
summarize / ask_llm condense / delegate hard reasoning to a larger model safe
finish terminate the agent loop safe
inherited dev ops open_file, list_dir, run_command, write_file, git_push, api_call, db_query per op

Behaviour families (training coverage)

  • dispatch β€” instruction β†’ one tool call (the strongest mode).
  • search_react / code_react / browse_react β€” react to an observation (results, stdout/error, page state) with the next action: refine, fix-and-retry, extract, finish.
  • research_chain β€” web_search β†’ scrape_page β†’ summarize.
  • media_pipeline β€” download_file β†’ ffmpeg β†’ finish (gated).
  • risk_gate_agent β€” plans mixing safe + critical ops (transfer / download / system code).
  • autonomous_step β€” goal + state β†’ the single next op (incl. finish): the loop primitive.
  • dev_dispatch β€” replay of inherited IDE/dev ops (anti-forgetting).

Safety: risk flag + mandatory deterministic backstop

Every op carries a learned risk flag (safe / critical) meant to drive a user-confirmation gate. The flag is advisory, not the safety boundary. The host application MUST enforce a deterministic policy that forces confirmation on known-dangerous operations regardless of the flag β€” in particular:

  • transfer_token (value movement) β€” always confirm; never auto-execute;
  • download_file, external api_call mutations, and execute_python that touches the filesystem / network / system β€” confirm;
  • run_command matching dangerous patterns (e.g. rm -rf, git push), git_push, write_file, open_app β€” confirm.

The flag may only harden (safe β†’ critical), never permit. Treat a missing critical flag as a false negative to be caught by the backstop.

Intended use

The structured-action stage of an autonomous agent: research assistants, media-editing pipelines (ffmpeg), browser/YouTube navigation, code-execution loops, on-device automation. Runs fully offline; the executor supplies the actual tools.

Limitations (honest)

  • Short chains, looped β€” not long monolithic plans. Reactive 1–3-op emission is the model's strength; tasks needing one long exactly-ordered plan must be decomposed by the host loop into short steps. This is by design, not a regression.
  • ~30M scale: limited open-ended reasoning and world knowledge; delegate hard reasoning via ask_llm to a larger model.
  • French covers agentic instructions, not free prose.
  • Tool set is fixed at fine-tune time; new tools require additional fine-tuning.
  • Benchmarks are synthetic (disjoint seed); they validate routing/format/risk-gating, not real-world tool success, which depends on the executor.

Lineage

SAM-G (base, dual-mode) β†’ SAM-G-Reasoning β†’ SAM-G-CobraTooling (IDE tools, robust) β†’ SAM-G-Agent (autonomous tool dispatcher).

Disclosure

Architecture internals, tokenizer construction, data generators, and ablations are proprietary and withheld. This card documents the released artifact only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AMFORGE/sam-g-agent

Finetuned
(1)
this model