Vamba
This repo contains model checkpoints for Vamba-Qwen2-VL-7B. Vamba is a hybrid Mamba-Transformer model that leverages cross-attention layers and Mamba-2 blocks for efficient hour-long video understanding.
๐ Homepage | ๐ arXiv | ๐ป GitHub | ๐ค Model
Vamba Model Architecture
Citation
If you find our paper useful, please cite us with
@misc{ren2025vambaunderstandinghourlongvideos,
title={Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers},
author={Weiming Ren and Wentao Ma and Huan Yang and Cong Wei and Ge Zhang and Wenhu Chen},
year={2025},
eprint={2503.11579},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.11579},
}
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.