xinyu1205 commited on
Commit
c9cb5fa
1 Parent(s): 3ad9e0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -1,3 +1,46 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: zero-shot-image-classification
6
+ tags:
7
+ - image recognition
8
  ---
9
+
10
+ # Recognize Anything & Tag2Text
11
+
12
+ Model card for <a href="https://recognize-anything.github.io/">Recognize Anything Plus Model (RAM++) </a>.
13
+
14
+ RAM++ is the next generation of RAM, which can recognize any category with high accuracy, including both predefined common categories and diverse open-set categories.
15
+
16
+ RAM++ outperforms existing SOTA image fundamental reocngition models in terms of common tag categorie, uncommon tag categories, and human-object interaction phrase.
17
+
18
+
19
+ ## TL;DR
20
+
21
+ Authors from the [paper](https://arxiv.org/abs/2306.03514) write in the abstract:
22
+
23
+ *We present the Recognize Anything Model~(RAM): a strong foundation model for image tagging. RAM makes a substantial step for large models in computer vision, demonstrating the zero-shot ability to recognize any common category with high accuracy. By leveraging large-scale image-text pairs for training instead of manual annotations, RAM introduces a new paradigm for image tagging. We evaluate the tagging capability of RAM on numerous benchmarks and observe an impressive zero-shot performance, which significantly outperforms CLIP and BLIP. Remarkably, RAM even surpasses fully supervised models and exhibits a competitive performance compared with the Google tagging API.*
24
+
25
+
26
+ ## BibTex and citation info
27
+
28
+ ```
29
+
30
+
31
+
32
+ @article{zhang2023recognize,
33
+ title={Recognize Anything: A Strong Image Tagging Model},
34
+ author={Zhang, Youcai and Huang, Xinyu and Ma, Jinyu and Li, Zhaoyang and Luo, Zhaochuan and Xie, Yanchun and Qin, Yuzhuo and Luo, Tong and Li, Yaqian and Liu, Shilong and others},
35
+ journal={arXiv preprint arXiv:2306.03514},
36
+ year={2023}
37
+ }
38
+
39
+ @article{huang2023tag2text,
40
+
41
+ title={Tag2Text: Guiding Vision-Language Model via Image Tagging},
42
+ author={Huang, Xinyu and Zhang, Youcai and Ma, Jinyu and Tian, Weiwei and Feng, Rui and Zhang, Yuejie and Li, Yaqian and Guo, Yandong and Zhang, Lei},
43
+ journal={arXiv preprint arXiv:2303.05657},
44
+ year={2023}
45
+ }
46
+ ```