---
license: gpl
---
This is the official pre-trained model of the paper ''VIRT: Vision Instructed Robotic Transformer for Manipulation Learning''. The model is pre-trained using the robotic 
imagery pre-training technique on the Droid dataset. If you find this model useful, please cite:

```BibTeX
@article{li2024virt,
  title={VIRT: Vision Instructed Robotic Transformer for Manipulation Learning},
  author={Zhuoling, Li and Liangliang, Ren and Jinrong, Yang and Yong, Zhao and others},
  journal={arXiv preprint arXiv:2410.07169},
  year={2024}
}
```