USAM β usam-full-loss
Canonical USAM loss recipe: action + aux RGB + aux depth + cosine drift + subtask completion. Baseline.
USAM Stage B1-real pretrain checkpoints, variant usam-full-loss. Architecture: Qwen3-VL-4B-Instruct backbone (LoRA r=16) + DINOv3 visual encoder + LDA flow-matching action head + USAM auxiliary heads (drift / subtask / depth-RGB geom).
Provenance
- SLURM job:
60949 - Run directory:
runs/usam_real_qwen_b1-60949/ - Source config:
configs/train/stage_b1_real_qwen_pretrain.yaml(mirrored asconfig.yamlin this repo)
Loss configuration
loss_weights:
action: 1.0
rgb: 1.0
depth: 0.3
drift: 0.1
subtask: 0.1
geom_target: 0.0
ramp_steps: 50000 # matches max_steps
Saved checkpoints
14 checkpoints, every 2500 steps:
checkpoint_step00002500.ptcheckpoint_step00005000.ptcheckpoint_step00007500.ptcheckpoint_step00010000.ptcheckpoint_step00012500.ptcheckpoint_step00015000.ptcheckpoint_step00017500.ptcheckpoint_step00020000.ptcheckpoint_step00022500.ptcheckpoint_step00025000.ptcheckpoint_step00027500.ptcheckpoint_step00030000.ptcheckpoint_step00032500.ptcheckpoint_step00035000.ptLatest step:
35000Checkpoint format:
trainable+buffersβ state_dict (LoRA + trainable adapters + buffers) plus full AdamW optimizer state, scheduler state, and run metadata. Every saved step is independently resumable for continued training.
Ablation context
This repo is one variant in the USAM B1-real loss-ablation matrix. The seven variants isolate single-loss contributions:
| Variant | Repo | What it tests |
|---|---|---|
| Full + geom | usam-full-loss-geom |
Upper bound: does depth-RGB geometric consistency help on top of the baseline? |
| Full (baseline) | usam-full-loss |
Canonical recipe β action + rgb + depth + drift + subtask |
| Action only | usam-action-only |
Lower bound: pure VLA action loss |
| No aux vision | usam-no-aux-vision |
Does aux RGB + depth co-training help? |
| No USAM aux | usam-no-usam-aux |
Do drift + subtask add lift beyond LDA-style co-training? |
| Drift only | usam-drift-only |
Marginal contribution of drift alone |
| 3-source (DROID) | usam-full |
Canonical recipe + DROID dataset (3-source full data mix) |
See docs/ABLATION_STUDY.md in the source repo for the full design.
Usage
import torch
ckpt = torch.load(
"checkpoint_step00035000.pt",
weights_only=False,
map_location="cpu",
)
state_dict = ckpt["state_dict"] # trainable + buffers only
step = ckpt["step"] # int
opt_state = ckpt["optimizer"] # for resume
sched = ckpt["scheduler"] # for resume
# Load into a freshly-constructed USAM model:
missing, unexpected = model.load_state_dict(state_dict, strict=False)
# `missing` will contain the frozen base-model keys (Qwen3-VL + DINOv3),
# which load from their respective HF base checkpoints. See
# usam/_train_helpers.py:2437-2453 for the reference loader.
- Downloads last month
- 6
Model tree for christian0420/usam-full-loss
Base model
Qwen/Qwen3-VL-4B-Instruct