guava-05-14

LoRA adapter for Qwen/Qwen3.5-4B, trained as part of the GUAVA project: a hierarchical robot-manipulation framework where a VLM acts as the high-level planner and emits structured tool calls that drive an analytically-controlled motor skill library.

This is the v6-full-r16-lr2e-5 SFT run, best checkpoint (step 105, lowest validation loss).

Intended use

Given a single scene image and a natural-language task instruction, the adapter produces a <think>...</think><tool_call>{...}</tool_call> turn that names exactly one tool call. Running it in a closed loop with a simulator / real robot that executes the tool and returns the next observation lets the model complete pick-and-place, stacking, pushing, and place-near tasks.

The adapter is not standalone — it must be loaded on top of its base model (vision tower + LLM both required at inference; only the LLM linear layers carry trained LoRA weights, ViT and aligner stay frozen).

Tool surface

The system prompt exposes 9 tools; all are demonstrated in the training data.

Tool	Signature
`move`	`target_position: list[float]`
`grasp`	`target: str`
`align`	`target_name: str, position: str, clearance: str`
`get_position`	`object_name: str`
`get_position_and_size`	`object_name: str`
`close_gripper`	`()`
`release`	`()`
`rotate`	`angle_deg: float, axis: str`
`home_pose`	`()`

position ∈ {top, left, right, front, back}, clearance ∈ {small, medium, large}.

Training data

GUAVA data/version-6: ~445 successful trajectories across 10 manipulation tasks. All trials carry the unified GUAVA v1 system prompt.

Task	Trials
`bar_on_shelf`, `can_in_bin`, `cube_stack`, `cup_on_plate`, `tomato_in_bowl`	33, 56, 37, 51, 45
`push_cereal`, `stack_bowls`, `milk_near_cup`, `spoon_in_mug`, `hotdog_near_donut`	32, 30, 58, 51, 52

Trajectories are in ShareGPT format ({from, value} turns, human → gpt → tool → gpt → ...); the gpt turn emits <think>...</think><tool_call>{"name": "...", "arguments": {...}}</tool_call>.

Training procedure

Framework: ms-swift SFT, LoRA
Base: Qwen/Qwen3.5-4B (vision tower + aligner frozen, only LLM linear layers trained)
LoRA: r=16, alpha=32, dropout=0.05, target_modules=all-linear
Optimizer / schedule: AdamW, lr=2e-5, cosine, 5% warmup, weight decay 0.01, grad clip 1.0
Batch: per_device_train_batch_size=1, gradient_accumulation_steps=8
Sequence: max_length=8192, bfloat16
Epochs: 3 (105 optimizer steps; best at step 105, val loss 0.5393)
Hardware: 2× GPU (per CUDA_VISIBLE_DEVICES=0,1)

Full training script: scripts/run_train_version_6.sh in the GUAVA repo.

How to load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor
import torch

BASE = "Qwen/Qwen3.5-4B"
ADAPTER = "AIcell/guava-05-14"

processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()

For deployment, merge once with model.merge_and_unload() to fold the adapter into the base weights, then serve via your usual VLM runtime (vLLM, TGI, etc.).

Limitations

The training trajectories were collected with an earlier version of the GUAVA perception stack; the model may emit tool sequences calibrated to the old align('top') behaviour. See guava/tools/tools_basefix.py in the project repo for the base-frame fix used at evaluation time.
Only the 10 tasks listed above are represented; cross-task generalisation is not yet evaluated.
The system prompt the model was trained against is the GUAVA v1 prompt (older / simpler variant); prompts with additional guidance text may be out-of-distribution.

License

Apache-2.0 (matches the base model). The training data and demonstrations are project-internal; redistribute the adapter under the same terms as the base model.

Downloads last month: 19

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AIcell/guava-05-14

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(185)

this model