--- license: apache-2.0 datasets: - AlexFierro9/Kinetics400 - imagenet-1k - HuggingFaceM4/something_something_v2 language: - en pipeline_tag: video-classification extra_gated_fields: Name: text Company/Organization: text Country: text E-Mail: text ---
# VideoMamba ## Model Details VideoMamba is a purely SSM-based model for video understanding. - **Developed by:** [OpenGVLab](https://github.com/OpenGVLab) - **Model type:** An efficient backbone based on the bidirectional state space model. - **License:** Non-commercial license ### Model Sources - **Repository:** https://github.com/OpenGVLab/VideoMamba - **Paper:** https://arxiv.org/abs/2403.06977 ## Uses The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone. The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence. ## How to Get Started with the Model - You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py - Then you can load this checkpoint and start training. ### Citation Information ``` @misc{li2024videomamba, title={VideoMamba: State Space Model for Efficient Video Understanding}, author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao}, year={2024}, eprint={2403.06977}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```