USAM — usam-full-loss

Canonical USAM loss recipe: action + aux RGB + aux depth + cosine drift + subtask completion. Baseline.

USAM Stage B1-real pretrain checkpoints, variant usam-full-loss. Architecture: Qwen3-VL-4B-Instruct backbone (LoRA r=16) + DINOv3 visual encoder + LDA flow-matching action head + USAM auxiliary heads (drift / subtask / depth-RGB geom).

Provenance

SLURM job: 60949
Run directory: runs/usam_real_qwen_b1-60949/
Source config: configs/train/stage_b1_real_qwen_pretrain.yaml (mirrored as config.yaml in this repo)

Loss configuration

loss_weights:
  action: 1.0
  rgb: 1.0
  depth: 0.3
  drift: 0.1
  subtask: 0.1
  geom_target: 0.0
  ramp_steps: 50000           # matches max_steps

Saved checkpoints

14 checkpoints, every 2500 steps:

checkpoint_step00002500.pt
checkpoint_step00005000.pt
checkpoint_step00007500.pt
checkpoint_step00010000.pt
checkpoint_step00012500.pt
checkpoint_step00015000.pt
checkpoint_step00017500.pt
checkpoint_step00020000.pt
checkpoint_step00022500.pt
checkpoint_step00025000.pt
checkpoint_step00027500.pt
checkpoint_step00030000.pt
checkpoint_step00032500.pt
checkpoint_step00035000.pt
Latest step: 35000
Checkpoint format: trainable+buffers — state_dict (LoRA + trainable adapters + buffers) plus full AdamW optimizer state, scheduler state, and run metadata. Every saved step is independently resumable for continued training.

Ablation context

This repo is one variant in the USAM B1-real loss-ablation matrix. The seven variants isolate single-loss contributions:

Variant	Repo	What it tests
Full + geom	`usam-full-loss-geom`	Upper bound: does depth-RGB geometric consistency help on top of the baseline?
Full (baseline)	`usam-full-loss`	Canonical recipe — `action + rgb + depth + drift + subtask`
Action only	`usam-action-only`	Lower bound: pure VLA action loss
No aux vision	`usam-no-aux-vision`	Does aux RGB + depth co-training help?
No USAM aux	`usam-no-usam-aux`	Do drift + subtask add lift beyond LDA-style co-training?
Drift only	`usam-drift-only`	Marginal contribution of drift alone
3-source (DROID)	`usam-full`	Canonical recipe + DROID dataset (3-source full data mix)

See docs/ABLATION_STUDY.md in the source repo for the full design.

Usage

import torch

ckpt = torch.load(
    "checkpoint_step00035000.pt",
    weights_only=False,
    map_location="cpu",
)
state_dict = ckpt["state_dict"]    # trainable + buffers only
step       = ckpt["step"]          # int
opt_state  = ckpt["optimizer"]     # for resume
sched      = ckpt["scheduler"]     # for resume

# Load into a freshly-constructed USAM model:
missing, unexpected = model.load_state_dict(state_dict, strict=False)
# `missing` will contain the frozen base-model keys (Qwen3-VL + DINOv3),
# which load from their respective HF base checkpoints. See
# usam/_train_helpers.py:2437-2453 for the reference loader.

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for christian0420/usam-full-loss

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(331)

this model