guava-05-14

LoRA adapter for Qwen/Qwen3.5-4B, trained as part of the GUAVA project: a hierarchical robot-manipulation framework where a VLM acts as the high-level planner and emits structured tool calls that drive an analytically-controlled motor skill library.

This is the v6-full-r16-lr2e-5 SFT run, best checkpoint (step 105, lowest validation loss).

Intended use

Given a single scene image and a natural-language task instruction, the adapter produces a <think>...</think><tool_call>{...}</tool_call> turn that names exactly one tool call. Running it in a closed loop with a simulator / real robot that executes the tool and returns the next observation lets the model complete pick-and-place, stacking, pushing, and place-near tasks.

The adapter is not standalone β€” it must be loaded on top of its base model (vision tower + LLM both required at inference; only the LLM linear layers carry trained LoRA weights, ViT and aligner stay frozen).

Tool surface

The system prompt exposes 9 tools; all are demonstrated in the training data.

Tool Signature
move target_position: list[float]
grasp target: str
align target_name: str, position: str, clearance: str
get_position object_name: str
get_position_and_size object_name: str
close_gripper ()
release ()
rotate angle_deg: float, axis: str
home_pose ()

position ∈ {top, left, right, front, back}, clearance ∈ {small, medium, large}.

Training data

GUAVA data/version-6: ~445 successful trajectories across 10 manipulation tasks. All trials carry the unified GUAVA v1 system prompt.

Task Trials
bar_on_shelf, can_in_bin, cube_stack, cup_on_plate, tomato_in_bowl 33, 56, 37, 51, 45
push_cereal, stack_bowls, milk_near_cup, spoon_in_mug, hotdog_near_donut 32, 30, 58, 51, 52

Trajectories are in ShareGPT format ({from, value} turns, human β†’ gpt β†’ tool β†’ gpt β†’ ...); the gpt turn emits <think>...</think><tool_call>{"name": "...", "arguments": {...}}</tool_call>.

Training procedure

  • Framework: ms-swift SFT, LoRA
  • Base: Qwen/Qwen3.5-4B (vision tower + aligner frozen, only LLM linear layers trained)
  • LoRA: r=16, alpha=32, dropout=0.05, target_modules=all-linear
  • Optimizer / schedule: AdamW, lr=2e-5, cosine, 5% warmup, weight decay 0.01, grad clip 1.0
  • Batch: per_device_train_batch_size=1, gradient_accumulation_steps=8
  • Sequence: max_length=8192, bfloat16
  • Epochs: 3 (105 optimizer steps; best at step 105, val loss 0.5393)
  • Hardware: 2Γ— GPU (per CUDA_VISIBLE_DEVICES=0,1)

Full training script: scripts/run_train_version_6.sh in the GUAVA repo.

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

BASE = "Qwen/Qwen3.5-4B"
ADAPTER = "AIcell/guava-05-14"

processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()

For deployment, merge once with model.merge_and_unload() to fold the adapter into the base weights, then serve via your usual VLM runtime (vLLM, TGI, etc.).

Limitations

  • The training trajectories were collected with an earlier version of the GUAVA perception stack; the model may emit tool sequences calibrated to the old align('top') behaviour. See guava/tools/tools_basefix.py in the project repo for the base-frame fix used at evaluation time.
  • Only the 10 tasks listed above are represented; cross-task generalisation is not yet evaluated.
  • The system prompt the model was trained against is the GUAVA v1 prompt (older / simpler variant); prompts with additional guidance text may be out-of-distribution.

License

Apache-2.0 (matches the base model). The training data and demonstrations are project-internal; redistribute the adapter under the same terms as the base model.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AIcell/guava-05-14

Finetuned
Qwen/Qwen3.5-4B
Adapter
(185)
this model