Instructions to use AMFORGE/samg-checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg-checkpoints with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg-checkpoints")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg-checkpoints", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg-checkpoints with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg-checkpoints" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg-checkpoints
- SGLang
How to use AMFORGE/samg-checkpoints with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg-checkpoints with Docker Model Runner:
docker model run hf.co/AMFORGE/samg-checkpoints
SAM-G
SAM-G is a 30.3M-parameter dual-mode language model for offline structured action generation. Given a natural-language instruction it emits compact, schema-valid JSON for ten domains; given a question it emits free text. Mode selection is learned, not prompted. Built by AMEFORGE for robotics, IoT and embedded deployment where hosted-LLM APIs are too costly, too slow, or unavailable.
- Parameters: 30.3M · Footprint: 121 MB fp32 (~30 MB int8)
- Context: 1024 tokens · Languages: English, French (actions)
- Throughput: ~235 tok/s, 16 ms first-token (single GPU); runs on a Raspberry-Pi-class CPU
- Released: model weights + inference tokenizer. Training pipeline, data generators and architecture are proprietary.
Two modes
| Input | Model emits |
|---|---|
turn on the kitchen lamp |
[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}} |
what is a mutex |
[CHAT] A mutex is a lock that allows one thread at a time. |
Domains: ros, http, mqtt, db, workflow, ecommerce, vehicle,
home, cal, file.
Benchmark
SAM-G is evaluated zero-shot in its native format; baselines run 3-shot
through their chat template with a system instruction. bpb is tokenizer-fair
(per-token perplexity is not comparable across vocabularies). exact/M =
action exact-match per million parameters — the efficiency axis.
| Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
|---|---|---|---|---|---|---|---|---|---|
| SAM-G | 30.3M | 1.179 | 100 | 76 | 77 | 83 | 121 | 235 | 2.51 |
| Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
| Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
| SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
| Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |
*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G leads decisively on structured action, French actions, footprint, speed, and exact-match per parameter. Notably Qwen2.5-1.5B scores below Qwen2.5-0.5B on action exact-match — capability here comes from domain specialization, not scale.
Per-domain exact match (%)
| ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |
All general baselines score 0 on most domains, succeeding only partially on the
most generic ones (home, cal). ros (floating-point fields) is SAM-G's weakest
schema and benefits most from additional training data.
Usage
import sentencepiece as spm, torch
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
Always parse output as JSON and validate against your schema before execution.
Intended use
On-device home automation; NL→ROS robot command layers; MQTT fleet gateways; offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers; and the structured tool-calling stage of agentic pipelines — as a drop-in replacement or a fast router ahead of a larger hosted model.
Limitations
- Not a general assistant: factual knowledge and open-ended reasoning are limited at this scale; larger general models lead on bits-per-byte and cloze.
- French covers actions, not extended prose.
- Schemas outside the ten domains need fine-tuning. The
rosschema (floating-point fields) is the weakest and benefits most from more data. - The action benchmark is synthetic, drawn from the training distribution family with a disjoint evaluation seed (999).
Citation
@misc{samg2026,
title = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
author = {AMEFORGE Lab},
year = {2026}
}
Evaluation results
- Valid JSON (%)self-reported100.000
- Exact match (%)self-reported76.000
- Exact match, French (%)self-reported77.000
- Bits per byteself-reported1.179