rwightman HF staff commited on
Commit
82d3215
1 Parent(s): 7a39c29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -18,6 +18,13 @@ license: mit
18
 
19
  A series of CLIP [ConvNeXt-Base](https://arxiv.org/abs/2201.03545) (w/ wide embed dim) models trained on subsets LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
20
 
 
 
 
 
 
 
 
21
  The models utilize the [timm](https://github.com/rwightman/pytorch-image-models) ConvNeXt-Base model (`convnext_base`) as the image tower, and the same text tower as the RN50x4 (depth 12, embed dim 640) model from OpenAI CLIP. The base models are trained at 256x256 image resolution and roughly match the RN50x4 models on FLOPs and activation counts. The models with `320` in the name are trained at 320x320.
22
 
23
  All models in this series were trained for 13B samples and have ImageNet Zero-Shot top-1 of >= 70.8%. Comparing to ViT-B/16 at 34B SS with zero-shot of 70.2% (68.1% for 13B SS) this suggests the ConvNeXt architecture may be more sample efficient in this range of model scale. More experiments needed to confirm.
@@ -122,6 +129,8 @@ The models achieve between 70.8 and 71.7 zero-shot top-1 accuracy on ImageNet-1k
122
 
123
  An initial round of benchmarks have been performed on a wider range of datasets, to be viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
124
 
 
 
125
  # Acknowledgements
126
 
127
  Acknowledging [stability.ai](https://stability.ai/) and the Gauss Centre for Supercomputing e.V. (http://gauss-centre.eu) for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC).
 
18
 
19
  A series of CLIP [ConvNeXt-Base](https://arxiv.org/abs/2201.03545) (w/ wide embed dim) models trained on subsets LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
20
 
21
+ Goals:
22
+ * Explore an alternative to ViT and ResNet (w/ AttentionPooling) CLIP models that scales well with model size and image resolution
23
+
24
+ Firsts:
25
+ * First known ConvNeXt CLIP models trained at scale in the range of CLIP ViT-B/16 and RN50x4 models
26
+ * First released model weights exploring increase of augmentation + regularization for image tower via adding (increased resize range of RRC, adding random erasing, adding stochastic depth)
27
+
28
  The models utilize the [timm](https://github.com/rwightman/pytorch-image-models) ConvNeXt-Base model (`convnext_base`) as the image tower, and the same text tower as the RN50x4 (depth 12, embed dim 640) model from OpenAI CLIP. The base models are trained at 256x256 image resolution and roughly match the RN50x4 models on FLOPs and activation counts. The models with `320` in the name are trained at 320x320.
29
 
30
  All models in this series were trained for 13B samples and have ImageNet Zero-Shot top-1 of >= 70.8%. Comparing to ViT-B/16 at 34B SS with zero-shot of 70.2% (68.1% for 13B SS) this suggests the ConvNeXt architecture may be more sample efficient in this range of model scale. More experiments needed to confirm.
 
129
 
130
  An initial round of benchmarks have been performed on a wider range of datasets, to be viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
131
 
132
+ As part of exploring increased augmentation + regularization, more analysis is required but early tests indicate the `augreg` models evaluate well over a wider range of resolutions than the non augreg models. Especially the 320x320 LAION-A model, where the augreg disappointed at 320x320 w/ 71.3, but passes the non augreg 71.7 w/ a 72.2 when evaluated at 384x384 (non augreg drops to 71.0 at 384x384).
133
+
134
  # Acknowledgements
135
 
136
  Acknowledging [stability.ai](https://stability.ai/) and the Gauss Centre for Supercomputing e.V. (http://gauss-centre.eu) for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC).