Ross Wightman commited on
Commit
cc77892
1 Parent(s): 11e0f39

Update README, add tokenizer/vocab/preprocess cfg

Browse files
README.md CHANGED
@@ -6,11 +6,12 @@ license: mit
6
  # Table of Contents
7
 
8
  1. [Model Details](#model-details)
9
- 1. [Uses](#uses)
10
- 1. [Training Details](#training-details)
11
- 1. [Evaluation](#evaluation)
12
- 1. [Citation](#citation)
13
- 1. [How To Get Started With the Model](#how-to-get-started-with-the-model)
 
14
 
15
 
16
  # Model Details
@@ -19,9 +20,11 @@ license: mit
19
 
20
  A CLIP ViT-B/32 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
21
 
 
 
22
  # Uses
23
 
24
- As per the original OpenAI CLIP models, this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
25
 
26
  The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
27
 
@@ -55,7 +58,7 @@ This model was trained with the 2 Billion sample English subset of LAION-5B (htt
55
 
56
  ## Training Procedure
57
 
58
- **TODO** - add SLURM script, hparams.
59
 
60
  # Evaluation
61
 
@@ -71,7 +74,15 @@ The testing is performed with VTAB+ (A combination of VTAB (https://arxiv.org/ab
71
 
72
  ## Results
73
 
74
- **TODO** - full zero-shot and retrieval benchmark results
 
 
 
 
 
 
 
 
75
 
76
  # Citation
77
 
6
  # Table of Contents
7
 
8
  1. [Model Details](#model-details)
9
+ 2. [Uses](#uses)
10
+ 3. [Training Details](#training-details)
11
+ 4. [Evaluation](#evaluation)
12
+ 5. [Acknowledgements](#acknowledgements)
13
+ 6. [Citation](#citation)
14
+ 7. [How To Get Started With the Model](#how-to-get-started-with-the-model)
15
 
16
 
17
  # Model Details
20
 
21
  A CLIP ViT-B/32 model trained with the LAION-2B English subset of LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
22
 
23
+ Model training done by Romain Beaumont on the [stability.ai](https://stability.ai/) cluster.
24
+
25
  # Uses
26
 
27
+ As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such model.
28
 
29
  The OpenAI CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis. Additionally, the LAION-5B blog (https://laion.ai/blog/laion-5b/) and upcoming paper include additional discussion as it relates specifically to the training dataset.
30
 
58
 
59
  ## Training Procedure
60
 
61
+ Please see [training notes](https://docs.google.com/document/d/1EFbMLRWSSV0LUf9Du1pWzWqgeiIRPwEWX2s1C6mAk5c) and [wandb logs](https://wandb.ai/rom1504/eval_openclip/reports/B-32-2B--VmlldzoyNDkwNDMy).
62
 
63
  # Evaluation
64
 
74
 
75
  ## Results
76
 
77
+ The model achieves a 66.6 zero-shot top-1 accuracy on ImageNet-1k.
78
+
79
+ An initial round of benchmarks have been performed on a wider range of datasets, currently viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
80
+
81
+ **TODO** - create table for just this model's metrics.
82
+
83
+ # Acknowledgements
84
+
85
+ Acknowledging [stability.ai](https://stability.ai/) for the compute used to train this model.
86
 
87
  # Citation
88
 
preprocessor_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": 224,
3
+ "do_center_crop": true,
4
+ "do_normalize": true,
5
+ "do_resize": true,
6
+ "feature_extractor_type": "CLIPFeatureExtractor",
7
+ "image_mean": [
8
+ 0.48145466,
9
+ 0.4578275,
10
+ 0.40821073
11
+ ],
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "resample": 3,
18
+ "size": 224
19
+ }
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": "<|endoftext|>"}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "pad_token": "<|endoftext|>", "add_prefix_space": false, "errors": "replace", "do_lower_case": true, "name_or_path": "./clip_ViT_B_32/"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff