LianghuiZhu commited on
Commit
4666183
1 Parent(s): ce2ac3b

Update README.md

Browse files

[add] initial commit.

Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+
7
+ <br>
8
+
9
+ # Vim Model Card
10
+
11
+ ## Model Details
12
+
13
+ Vision Mamba (Vim) is a generic backbone trained on the ImageNet-1K dataset for vision tasks.
14
+
15
+ - **Developed by:** [HUST](https://english.hust.edu.cn/), [Horizon Robotics](https://en.horizon.cc/), [BAAI](https://www.baai.ac.cn/english.html)
16
+ - **Model type:** A generic vision backbone based on the bidirectional state space model (SSM) architecture.
17
+ - **License:** Non-commercial license
18
+
19
+
20
+ ### Model Sources
21
+
22
+ - **Repository:** https://github.com/hustvl/Vim
23
+ - **Paper:** https://arxiv.org/abs/2401.09417
24
+
25
+ ## Uses
26
+
27
+ The primary use of Vim is research on vision tasks, e.g., classification, segmentation, detection, and instance segmentation, with an SSM-based backbone.
28
+ The primary intended users of the model are researchers and hobbyists in computer vision, machine learning, and artificial intelligence.
29
+
30
+ ## How to Get Started with the Model
31
+
32
+ - You can replace the backbone for vision tasks with the proposed Vim: https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py
33
+ - Then you can load this checkpoint and start training.
34
+
35
+ ## Training Details
36
+
37
+ Vim is pretrained on ImageNet-1K with classification supervision.
38
+ The training data is around 1.3M images from [ImageNet-1K dataset](https://www.image-net.org/challenges/LSVRC/2012/).
39
+ See more details in this [paper](https://arxiv.org/abs/2401.09417).
40
+
41
+ ## Evaluation
42
+
43
+ Vim is evaluated on ImageNet-1K val set, and achieves 73.1% Top-1 Acc. See more details in this [paper](https://arxiv.org/abs/2401.09417).
44
+
45
+ ## Additional Information
46
+
47
+ ### Citation Information
48
+
49
+ ```
50
+ @article{vim,
51
+ title={Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model},
52
+ author={Lianghui Zhu and Bencheng Liao and Qian Zhang and Xinlong Wang and Wenyu Liu and Xinggang Wang},
53
+ journal={arXiv preprint arXiv:2401.09417},
54
+ year={2024}
55
+ }
56
+ ```
57
+
58
+