📢 Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Lift3D is a 3d robotics representation methods.

Lift3D elevates 2D foundation models to construct a 3D manipulation policy by systematically improving both implicit and explicit 3D robotic representations.

For implicit 3D representation, Lift3D introduces a task-aware MAE that masks task-related affordance regions and reconstructs depth geometric information, thereby enhancing the 3D spatial awareness of the 2D foundation model.
For explicit 3D representation, Lift3D employs a 2D model-lifting strategy, utilizing the pretrained positional encodings (PEs) of a 2D foundation model to effectively encode 3D point cloud data for manipulation imitation learning."

Here we provide MAE pretraining checkpoints (lift3d_clip_base.pth) and CLIP-vit-base checkpoints (ViT-B-32.pt).

🧩 Model Details

Developed by: Researchers from the HMI Lab, Peking University and Beijing Academy of Artificial Intelligence (BAAI).
Task Type: Robotic Manipulation
Model Base: Vision Transformer (ViT)
License: MIT License
GitHub: https://github.com/PKU-HMI-Lab/LIFT3D
Arxiv: https://arxiv.org/abs/2411.18623
Project Page: https://lift3d-web.github.io/

📚 BibTeX

@misc{jia2024lift3dfoundationpolicylifting,
      title={Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation}, 
      author={Yueru Jia and Jiaming Liu and Sixiang Chen and Chenyang Gu and Zhilue Wang and Longzan Luo and Lily Lee and Pengwei Wang and Zhongyuan Wang and Renrui Zhang and Shanghang Zhang},
      year={2024},
      eprint={2411.18623},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18623}, 
}