rom1504 commited on
Commit
2656573
1 Parent(s): a909af1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md CHANGED
@@ -1,3 +1,125 @@
1
  ---
2
  license: mit
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ widget:
4
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/cat-dog-music.png
5
+ candidate_labels: playing music, playing sports
6
+ example_title: Cat & Dog
7
  ---
8
+ # Model Card for CLIP ViT-H/14 frozen xlm roberta large - LAION-5B
9
+
10
+ # Table of Contents
11
+
12
+ 1. [Model Details](#model-details)
13
+ 2. [Uses](#uses)
14
+ 3. [Training Details](#training-details)
15
+ 4. [Evaluation](#evaluation)
16
+ 5. [Acknowledgements](#acknowledgements)
17
+ 6. [Citation](#citation)
18
+ 7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
19
+
20
+
21
+ # Model Details
22
+
23
+ ## Model Description
24
+
25
+ A CLIP ViT-H/14 frozen xlm roberta large model trained with the LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
26
+
27
+ Model training done by Romain Beaumont on the [stability.ai](https://stability.ai/) cluster.
28
+
29
+ # Uses
30
+
31
+ ## Direct Use
32
+
33
+ Zero-shot image classification, image and text retrieval, among others.
34
+
35
+ ## Downstream Use
36
+
37
+ Image classification and other image task fine-tuning, linear probe image classification, image generation guiding and conditioning, among others.
38
+
39
+ # Training Details
40
+
41
+ ## Training Data
42
+
43
+ This model was trained with the full LAION-5B (https://laion.ai/blog/laion-5b/).
44
+
45
+ ## Training Procedure
46
+
47
+ Training with batch size 90k for 13B sample of laion5B, see https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-large-unfrozen-vit-h-14-frozen--VmlldzoyOTc3ODY3
48
+
49
+ Model is H/14 on visual side, xlm roberta large initialized with pretrained weights on text side.
50
+
51
+ The H/14 was initialized from https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K and kept frozen during training.
52
+
53
+ # Evaluation
54
+
55
+ Evaluation done with code in the [LAION CLIP Benchmark suite](https://github.com/LAION-AI/CLIP_benchmark).
56
+
57
+ ## Testing Data, Factors & Metrics
58
+
59
+ ### Testing Data
60
+
61
+ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/abs/1910.04867) w/ additional robustness datasets) for classification and COCO and Flickr for retrieval.
62
+
63
+ ## Results
64
+
65
+ The model achieves imagenet 1k 77.0% (vs 78% for the english H/14)
66
+ ![results_xlm_roberta_large.png](results_xlm_roberta_large.png)
67
+
68
+ On zero shot classification on imagenet with translated prompts this model reaches:
69
+ * 56% in italian (vs 21% for https://github.com/clip-italian/clip-italian)
70
+ * 53% in japanese (vs 54.6% for https://github.com/rinnakk/japanese-clip)
71
+ * 55.7% in chinese (to be compared with https://github.com/OFA-Sys/Chinese-CLIP)
72
+
73
+ This model reaches strong results in both english and other languages.
74
+
75
+
76
+ # Acknowledgements
77
+
78
+ Acknowledging [stability.ai](https://stability.ai/) for the compute used to train this model.
79
+
80
+ # Citation
81
+
82
+ **BibTeX:**
83
+
84
+ In addition to forthcoming LAION-5B (https://laion.ai/blog/laion-5b/) paper, please cite:
85
+
86
+ OpenAI CLIP paper
87
+ ```
88
+ @inproceedings{Radford2021LearningTV,
89
+ title={Learning Transferable Visual Models From Natural Language Supervision},
90
+ author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
91
+ booktitle={ICML},
92
+ year={2021}
93
+ }
94
+ ```
95
+
96
+ OpenCLIP software
97
+ ```
98
+ @software{ilharco_gabriel_2021_5143773,
99
+ author = {Ilharco, Gabriel and
100
+ Wortsman, Mitchell and
101
+ Wightman, Ross and
102
+ Gordon, Cade and
103
+ Carlini, Nicholas and
104
+ Taori, Rohan and
105
+ Dave, Achal and
106
+ Shankar, Vaishaal and
107
+ Namkoong, Hongseok and
108
+ Miller, John and
109
+ Hajishirzi, Hannaneh and
110
+ Farhadi, Ali and
111
+ Schmidt, Ludwig},
112
+ title = {OpenCLIP},
113
+ month = jul,
114
+ year = 2021,
115
+ note = {If you use this software, please cite it as below.},
116
+ publisher = {Zenodo},
117
+ version = {0.1},
118
+ doi = {10.5281/zenodo.5143773},
119
+ url = {https://doi.org/10.5281/zenodo.5143773}
120
+ }
121
+ ```
122
+
123
+ # How To Get Started With the Model
124
+
125
+ https://github.com/mlfoundations/open_clip