Video Classification
English
File size: 1,639 Bytes
61e86fe
 
264ee12
 
 
 
 
 
 
69502a7
 
 
 
 
61e86fe
264ee12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: apache-2.0
datasets:
- AlexFierro9/Kinetics400
- imagenet-1k
- HuggingFaceM4/something_something_v2
language:
- en
pipeline_tag: video-classification
extra_gated_fields:
  Name: text
  Company/Organization: text
  Country: text
  E-Mail: text
---



<br>

# VideoMamba

## Model Details

VideoMamba is a purely SSM-based model for video understanding.

- **Developed by:** [OpenGVLab](https://github.com/OpenGVLab)
- **Model type:** An efficient backbone based on the bidirectional state space model.
- **License:** Non-commercial license


### Model Sources

- **Repository:** https://github.com/OpenGVLab/VideoMamba
- **Paper:** https://arxiv.org/abs/2403.06977

## Uses

The primary use of VideoMamba is research on image and video tasks, e.g., image classification, action recognition, long-term video understanding, and video-text retrieval, with an SSM-based backbone.
The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.

## How to Get Started with the Model

- You can replace the backbone for video tasks with the proposed VideoMamba: https://github.com/OpenGVLab/VideoMamba/blob/main/videomamba/video_sm/models/videomamba.py
- Then you can load this checkpoint and start training.


### Citation Information

```
@misc{li2024videomamba,
      title={VideoMamba: State Space Model for Efficient Video Understanding}, 
      author={Kunchang Li and Xinhao Li and Yi Wang and Yinan He and Yali Wang and Limin Wang and Yu Qiao},
      year={2024},
      eprint={2403.06977},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```