SAM-G

SAM-G is a 30.3M-parameter dual-mode language model for offline structured action generation. Given a natural-language instruction it emits compact, schema-valid JSON for ten domains; given a question it emits free text. Mode selection is learned, not prompted. Built by AMEFORGE for robotics, IoT and embedded deployment where hosted-LLM APIs are too costly, too slow, or unavailable.

Parameters: 30.3M · Footprint: 121 MB fp32 (~30 MB int8)
Context: 1024 tokens · Languages: English, French (actions)
Throughput: ~235 tok/s, 16 ms first-token (single GPU); runs on a Raspberry-Pi-class CPU
Released: model weights + inference tokenizer. Training pipeline, data generators and architecture are proprietary.

Two modes

Input	Model emits
`turn on the kitchen lamp`	`[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}`
`what is a mutex`	`[CHAT] A mutex is a lock that allows one thread at a time.`

Domains: ros, http, mqtt, db, workflow, ecommerce, vehicle, home, cal, file.

Benchmark

SAM-G is evaluated zero-shot in its native format; baselines run 3-shot through their chat template with a system instruction. bpb is tokenizer-fair (per-token perplexity is not comparable across vocabularies). exact/M = action exact-match per million parameters — the efficiency axis.

Model	Params	bpb ↓	JSON valid %	Exact %	Exact FR %	Cloze %	MB	tok/s	exact/M ↑
SAM-G	30.3M	1.179	100	76	77	83	121	235	2.51
Pythia-70M	70M	1.674	2	0	0	75	141	120	0.00
Qwen2.5-0.5B-Instruct	494M	0.814	99	25	7	96	988	27	0.05
SmolLM2-360M-Instruct	362M	0.812	96	14	0	96	724	21	0.04
Qwen2.5-1.5B-Instruct	889M	0.753	98	21	0	96	444*	13	0.02

_{*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
leads decisively on structured action, French actions, footprint, speed, and
exact-match per parameter. Notably Qwen2.5-1.5B scores below Qwen2.5-0.5B on
action exact-match — capability here comes from domain specialization, not
scale.}

Per-domain exact match (%)

ros	http	mqtt	db	workflow	ecommerce	vehicle	home	cal	file
0	100	100	100	60	100	100	50	80	60

All general baselines score 0 on most domains, succeeding only partially on the most generic ones (home, cal). ros (floating-point fields) is SAM-G's weakest schema and benefits most from additional training data.

Usage

import sentencepiece as spm, torch
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")

prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}

Always parse output as JSON and validate against your schema before execution.

Intended use

On-device home automation; NL→ROS robot command layers; MQTT fleet gateways; offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers; and the structured tool-calling stage of agentic pipelines — as a drop-in replacement or a fast router ahead of a larger hosted model.

Limitations

Not a general assistant: factual knowledge and open-ended reasoning are limited at this scale; larger general models lead on bits-per-byte and cloze.
French covers actions, not extended prose.
Schemas outside the ten domains need fine-tuning. The ros schema (floating-point fields) is the weakest and benefits most from more data.
The action benchmark is synthetic, drawn from the training distribution family with a disjoint evaluation seed (999).

Citation

@misc{samg2026,
  title  = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
  author = {AMEFORGE Lab},
  year   = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Valid JSON (%)
self-reported

100.000
Exact match (%)
self-reported

76.000
Exact match, French (%)
self-reported

77.000
Bits per byte
self-reported

1.179