VideoMamba

Model Details

VideoMamba is a purely SSM-based model for video understanding.

Developed by: OpenGVLab
Model type: An efficient backbone based on the bidirectional state space model.
License: Non-commercial license

Model Sources

Repository: https://github.com/OpenGVLab/VideoMamba
Paper: https://arxiv.org/abs/2403.06977

Uses

The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone. The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.

How to Get Started with the Model

You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py
Then you can load this checkpoint and start training.

Citation Information

@misc{li2024videomamba,
      title={VideoMamba: State Space Model for Efficient Video Understanding}, 
      author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao},
      year={2024},
      eprint={2403.06977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train OpenGVLab/VideoMamba

Spaces using OpenGVLab/VideoMamba 2

Collection including OpenGVLab/VideoMamba

VideoMamba

Collection

State Space Model for Efficient Video Understanding • 5 items • Updated Sep 28, 2025 • 5

Paper for OpenGVLab/VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 29