LLM-Tank — Gemma-3 270M → robot JSON

Source-code: https://codeberg.org/imperius/llm-tank

Fine-tuned Gemma-3 270M that translates one free-form English instruction for a tracked robot with a gripper arm into a strict JSON command list, executed in a MuJoCo simulation.

Full pipeline: text → this model → valid JSON → controller → robot drives / grasps. Code & sim: see the source repository.

LLM-Tank demo

What it outputs

A single JSON object {"commands": [ ... ]}. Actions:

  • movedirection (forward|backward), distance_m, speed?
  • turndirection (left|right), angle_deg, speed?
  • stop, waitduration_s
  • grasp / release — optional cellfront|front_left|front_right|left|right (discrete, relative to the robot; IK is solved by the controller, not the model)
  • out-of-scope / nonsense → {"commands": []}

The model emits no coordinates — only discrete actions/enums (this keeps generation reliable and schema-checkable).

Required input format (IMPORTANT)

The model was trained train == infer with a fixed short system prompt folded with the instruction into ONE user turn. You must use exactly this:

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

SYSTEM = ("You translate ONE English instruction for a tracked robot "
          "with a gripper arm into a single JSON object "
          '{"commands":[...]} using actions: move, turn, stop, wait, '
          "grasp, release. Output ONLY the JSON object, no prose, no "
          'markdown. If the instruction is out of scope or nonsense, '
          'output {"commands": []}.')

tok = AutoTokenizer.from_pretrained("PATH_OR_REPO")
model = AutoModelForCausalLM.from_pretrained("PATH_OR_REPO",
                                             torch_dtype="auto",
                                             device_map="auto")

def translate(instruction: str) -> dict:
    user = SYSTEM + "\n\n---\nINSTRUCTION: " + instruction.strip()
    enc = tok.apply_chat_template(
        [{"role": "user", "content": user}],
        tokenize=True, add_generation_prompt=True,
        return_dict=True, return_tensors="pt").to(model.device)
    out = model.generate(**enc, max_new_tokens=160, do_sample=False)
    txt = tok.decode(out[0][enc["input_ids"].shape[1]:],
                     skip_special_tokens=True)
    i, j = txt.find("{"), txt.rfind("}")
    try:
        return json.loads(txt[i:j + 1])
    except Exception:
        return {"commands": []}  # safe fallback

print(translate("go forward 2 meters then turn left"))
# {"commands": [{"action": "move", "direction": "forward",
#   "distance_m": 2.0}, {"action": "turn", "direction": "left",
#   "angle_deg": 90}]}
print(translate("pick it up"))      # {"commands": [{"action": "grasp"}]}
print(translate("make me a coffee"))# {"commands": []}

Greedy decoding (do_sample=False). The model is ~99% schema-valid without constrained decoding; always keep the safe fallback.

Metrics (held-out val, 352 examples: locomotion + manipulation + OOD)

metric value
schema_valid_rate 0.991
exact_match_rate 0.943
action_seq_accuracy 0.980
ood_f1 0.857
task_success (MuJoCo, 40) 0.975

Training

Full fine-tuning (not LoRA) of unsloth/gemma-3-270m-it on ~3.5k synthetic instruction→JSON pairs (generated with 120B models, validated against a JSON Schema). fp32, Kaggle T4. Two phases: locomotion, then

  • arm (grasp/release). Details in the source repo (docs/).

Demo

demo.mp4 (in this repo) — ~1 min, two panes: left = command + model JSON output, right = the robot acting in MuJoCo (real model + real physics, not staged).

Limitations

  • No perception: the model can't target objects by name/color, only by discrete relative cell. Object resolution is spatial (controller grabs the nearest graspable body in the chosen cell).
  • English only. Single fixed gripper, minimal custom arm.
  • Designed for the accompanying controller/sim; raw JSON is meaningless without it.

License

Weights are a derivative of Google Gemma-3 — use is governed by the Gemma Terms of Use. Accompanying code is under its own license (see the source repository).

Downloads last month
29
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Imperius/llm-tank

Finetuned
(394)
this model