Model Card
Inspired by Model Cards for Model Reporting (Mitchell et al.) and Lessons from Archives (Jo & Gebru), we’re providing some accompanying information about the VIMA model.
Model Details
VIMA (VisuoMotor Attention) is a novel Transformer agent that ingests multimodal prompts and outputs robot arm control autoregressively. VIMA is developed primarily by researchers at Stanford/NVIDIA.
Model Date
October 2022
Model Type
VIMA model consists of a pretrained T5 model as the prompt encoder, several tokenizers to process multimodal inputs, and a causal decoder that augoregressively predicts actions given the prompt and interaction history.
Model Versions
We released 7 checkpoints covering a spectrum of model capacity from 2M to 200M.
Model Use
Intended Use
The model is intended to be used alongside VIMA-Bench to study general robot manipulation with multimodal prompts.
Primary intended uses
The primary intended users of these models are AI researchers in robotics, multimodal learning, embodied agents, foundation models, etc.
Data
The models were trained with data generated by oracles implemented in VIMA-Bench. It includes 650K successful trajectories for behavior cloning. We use 600K trajectories for training. The remaining 50K trajectories are held out for validation purpose.
Performance and Limitations
Metrics and Performance
We quantify the performance of trained models using task success percentage aggregated over multiple tasks. We evaluate models' performance on task suite from VIMA-Bench and follow the proposed evaluation protocol. See our paper for more details.
Limitations
Our provided model checkpoints are pre-trained on VIMA-Bench, which may not directly generalize to other simulators or real world. Limitations are further discussed in the paper.
Paper and Citation
Our paper is posted on arXiv. If you find our work useful, please consider citing us!
@inproceedings{jiang2023vima,
title = {VIMA: General Robot Manipulation with Multimodal Prompts},
author = {Yunfan Jiang and Agrim Gupta and Zichen Zhang and Guanzhi Wang and Yongqiang Dou and Yanjun Chen and Li Fei-Fei and Anima Anandkumar and Yuke Zhu and Linxi Fan},
booktitle = {Fortieth International Conference on Machine Learning},
year = {2023}
}
- Downloads last month
- 0