Instructions to use AIcell/guava-05-14 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AIcell/guava-05-14 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/workspace/.hf_home/hub/models--Qwen--Qwen3.5-4B/snapshots/851bf6e806efd8d0a36b00ddf55e13ccb7b8cd0a") model = PeftModel.from_pretrained(base_model, "AIcell/guava-05-14") - Notebooks
- Google Colab
- Kaggle
guava-05-14
LoRA adapter for Qwen/Qwen3.5-4B,
trained as part of the GUAVA project: a hierarchical robot-manipulation
framework where a VLM acts as the high-level planner and emits structured
tool calls that drive an analytically-controlled motor skill library.
This is the v6-full-r16-lr2e-5 SFT run, best checkpoint (step 105, lowest
validation loss).
Intended use
Given a single scene image and a natural-language task instruction, the
adapter produces a <think>...</think><tool_call>{...}</tool_call> turn
that names exactly one tool call. Running it in a closed loop with a
simulator / real robot that executes the tool and returns the next
observation lets the model complete pick-and-place, stacking, pushing,
and place-near tasks.
The adapter is not standalone β it must be loaded on top of its base model (vision tower + LLM both required at inference; only the LLM linear layers carry trained LoRA weights, ViT and aligner stay frozen).
Tool surface
The system prompt exposes 9 tools; all are demonstrated in the training data.
| Tool | Signature |
|---|---|
move |
target_position: list[float] |
grasp |
target: str |
align |
target_name: str, position: str, clearance: str |
get_position |
object_name: str |
get_position_and_size |
object_name: str |
close_gripper |
() |
release |
() |
rotate |
angle_deg: float, axis: str |
home_pose |
() |
position β {top, left, right, front, back}, clearance β {small, medium, large}.
Training data
GUAVA data/version-6: ~445 successful trajectories across 10 manipulation
tasks. All trials carry the unified GUAVA v1 system prompt.
| Task | Trials |
|---|---|
bar_on_shelf, can_in_bin, cube_stack, cup_on_plate, tomato_in_bowl |
33, 56, 37, 51, 45 |
push_cereal, stack_bowls, milk_near_cup, spoon_in_mug, hotdog_near_donut |
32, 30, 58, 51, 52 |
Trajectories are in ShareGPT format
({from, value} turns, human β gpt β tool β gpt β ...); the gpt turn
emits <think>...</think><tool_call>{"name": "...", "arguments": {...}}</tool_call>.
Training procedure
- Framework: ms-swift SFT, LoRA
- Base:
Qwen/Qwen3.5-4B(vision tower + aligner frozen, only LLM linear layers trained) - LoRA:
r=16,alpha=32,dropout=0.05,target_modules=all-linear - Optimizer / schedule: AdamW,
lr=2e-5, cosine, 5% warmup, weight decay 0.01, grad clip 1.0 - Batch:
per_device_train_batch_size=1,gradient_accumulation_steps=8 - Sequence:
max_length=8192,bfloat16 - Epochs: 3 (105 optimizer steps; best at step 105, val loss
0.5393) - Hardware: 2Γ GPU (per
CUDA_VISIBLE_DEVICES=0,1)
Full training script: scripts/run_train_version_6.sh in the GUAVA repo.
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
BASE = "Qwen/Qwen3.5-4B"
ADAPTER = "AIcell/guava-05-14"
processor = AutoProcessor.from_pretrained(BASE, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, ADAPTER).eval()
For deployment, merge once with model.merge_and_unload() to fold the
adapter into the base weights, then serve via your usual VLM runtime
(vLLM, TGI, etc.).
Limitations
- The training trajectories were collected with an earlier version of the
GUAVA perception stack; the model may emit tool sequences calibrated to
the old
align('top')behaviour. Seeguava/tools/tools_basefix.pyin the project repo for the base-frame fix used at evaluation time. - Only the 10 tasks listed above are represented; cross-task generalisation is not yet evaluated.
- The system prompt the model was trained against is the GUAVA v1 prompt (older / simpler variant); prompts with additional guidance text may be out-of-distribution.
License
Apache-2.0 (matches the base model). The training data and demonstrations are project-internal; redistribute the adapter under the same terms as the base model.
- Downloads last month
- 19