File size: 1,639 Bytes
61e86fe 264ee12 69502a7 61e86fe 264ee12 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
license: apache-2.0
datasets:
- AlexFierro9/Kinetics400
- imagenet-1k
- HuggingFaceM4/something_something_v2
language:
- en
pipeline_tag: video-classification
extra_gated_fields:
Name: text
Company/Organization: text
Country: text
E-Mail: text
---
<br>
# VideoMamba
## Model Details
VideoMamba is a purely SSM-based model for video understanding.
- **Developed by:** [OpenGVLab](https://github.com/OpenGVLab)
- **Model type:** An efficient backbone based on the bidirectional state space model.
- **License:** Non-commercial license
### Model Sources
- **Repository:** https://github.com/OpenGVLab/VideoMamba
- **Paper:** https://arxiv.org/abs/2403.06977
## Uses
The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone.
The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.
## How to Get Started with the Model
- You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py
- Then you can load this checkpoint and start training.
### Citation Information
```
@misc{li2024videomamba,
title={VideoMamba: State Space Model for Efficient Video Understanding},
author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao},
year={2024},
eprint={2403.06977},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |