SAM2Act

SAM2Act is a multi-view robotics transformer policy for robotic manipulation. Built on RVT-2, it combines multi-resolution upsampling with visual embeddings from the SAM2 foundation model to improve 3D action prediction, multitask learning, and generalization. SAM2Act+ extends this policy with a memory bank, memory encoder, and memory attention so the agent can condition on prior observations and actions for spatial memory-dependent tasks.

For full project details, code, training instructions, and videos, see the SAM2Act website and GitHub repository.

Models

This model repository stores released SAM2Act and SAM2Act+ checkpoints together with the config files needed for evaluation.

The repository is organized as follows:

sam2act_rlbench/                       # SAM2Act checkpoint and configs for RLBench
sam2act_memorybench/sam2act_<task>/    # SAM2Act+ checkpoints and configs for MemoryBench

sam2act_rlbench/ contains the RLBench checkpoint model_89.pth and its exp_cfg.yaml and mvt_cfg.yaml files. Each MemoryBench task folder contains the Stage 1 checkpoint model_9.pth, the Stage 2 memory-conditioned checkpoint model_plus_19.pth, and the corresponding exp_cfg*.yaml and mvt_cfg*.yaml files.

For evaluation commands and expected directory placement, see the SAM2Act GitHub README or the SAM2Act project website.

Bibtex

If you use these models, please cite the SAM2Act paper:

@misc{fang2025sam2act,
      title={SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation},
      author={Haoquan Fang and Markus Grotz and Wilbert Pumacay and Yi Ru Wang and Dieter Fox and Ranjay Krishna and Jiafei Duan},
      year={2025},
      eprint={2501.18564},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2501.18564},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Collection including hqfang/sam2act-models

Paper for hqfang/sam2act-models