πŸ“’ Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Lift3D is a 3d robotics representation methods.

Lift3D elevates 2D foundation models to construct a 3D manipulation policy by systematically improving both implicit and explicit 3D robotic representations.

  • For implicit 3D representation, Lift3D introduces a task-aware MAE that masks task-related affordance regions and reconstructs depth geometric information, thereby enhancing the 3D spatial awareness of the 2D foundation model.
  • For explicit 3D representation, Lift3D employs a 2D model-lifting strategy, utilizing the pretrained positional encodings (PEs) of a 2D foundation model to effectively encode 3D point cloud data for manipulation imitation learning."

Here we provide MAE pretraining checkpoints (lift3d_clip_base.pth) and CLIP-vit-base checkpoints (ViT-B-32.pt).

🧩 Model Details

πŸ“š BibTeX

@misc{jia2024lift3dfoundationpolicylifting,
      title={Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation}, 
      author={Yueru Jia and Jiaming Liu and Sixiang Chen and Chenyang Gu and Zhilue Wang and Longzan Luo and Lily Lee and Pengwei Wang and Zhongyuan Wang and Renrui Zhang and Shanghang Zhang},
      year={2024},
      eprint={2411.18623},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18623}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .