Model

This is the MIT-licensed version of VC1-Base.

EAI-VC Repo. VC-1 Website, VC-1 Blogpost, VC-1 Paper,

Sampling every_k:

ImageNet 1,281,167

Ego (3,538,291 frames total)

1 # Ego4D full already subsampled with 2,790,520 frames
1 # 100DOH with 99,899 frames
60 # Epic Kitchens with 332,757 frames
80 # SSV2 with 315,115 frames

INav (779 643 frames total)

1 # RE10K with 779,643 frames

Total number 5,599,101 frames

dataset_type: path_dataset_with_manifest # choices [dataset_with_txt_files, omnidataset, path_dataset, path_dataset_with_manifest], dataset_size: 5_6m

Citation

If you use this model, please cite:

@inproceedings{vc2023,
      title={Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?},
      author={Arjun Majumdar and Karmesh Yadav and Sergio Arnaud and Yecheng Jason Ma and Claire Chen and Sneha Silwal and Aryan Jain and Vincent-Pierre Berges and Pieter Abbeel and Jitendra Malik and Dhruv Batra and Yixin Lin and Oleksandr Maksymets and Aravind Rajeswaran and Franziska Meier},
      year={2023},
      eprint={2303.18240},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}