laion
/

CLIP-convnext_base_w-laion2B-s13B-b82K-augreg

Zero-Shot Image Classification

OpenCLIP

TensorBoard

Safetensors

clip

Model card Files Files and versions Metrics Training metrics Community

rwightman HF staff commited on Jan 25, 2023

Commit

82d3215

•

1 Parent(s): 7a39c29

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -18,6 +18,13 @@ license: mit
 A series of CLIP [ConvNeXt-Base](https://arxiv.org/abs/2201.03545) (w/ wide embed dim) models trained on subsets LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
 The models utilize the [timm](https://github.com/rwightman/pytorch-image-models) ConvNeXt-Base model (`convnext_base`) as the image tower, and the same text tower as the RN50x4 (depth 12, embed dim 640) model from OpenAI CLIP. The base models are trained at 256x256 image resolution and roughly match the RN50x4 models on FLOPs and activation counts. The models with `320` in the name are trained at 320x320.
 All models in this series were trained for 13B samples and have ImageNet Zero-Shot top-1 of >= 70.8%. Comparing to ViT-B/16 at 34B SS with zero-shot of 70.2% (68.1% for 13B SS) this suggests the ConvNeXt architecture may be more sample efficient in this range of model scale. More experiments needed to confirm.
@@ -122,6 +129,8 @@ The models achieve between 70.8 and 71.7 zero-shot top-1 accuracy on ImageNet-1k
 An initial round of benchmarks have been performed on a wider range of datasets, to be viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
 # Acknowledgements
 Acknowledging [stability.ai](https://stability.ai/) and the Gauss Centre for Supercomputing e.V. (http://gauss-centre.eu) for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC).

 A series of CLIP [ConvNeXt-Base](https://arxiv.org/abs/2201.03545) (w/ wide embed dim) models trained on subsets LAION-5B (https://laion.ai/blog/laion-5b/) using OpenCLIP (https://github.com/mlfoundations/open_clip).
+Goals:
+  * Explore an alternative to ViT and ResNet (w/ AttentionPooling) CLIP models that scales well with model size and image resolution
+Firsts:
+  * First known ConvNeXt CLIP models trained at scale in the range of CLIP ViT-B/16 and RN50x4 models
+  * First released model weights exploring increase of augmentation + regularization for image tower via adding (increased resize range of RRC, adding random erasing, adding stochastic depth)
 The models utilize the [timm](https://github.com/rwightman/pytorch-image-models) ConvNeXt-Base model (`convnext_base`) as the image tower, and the same text tower as the RN50x4 (depth 12, embed dim 640) model from OpenAI CLIP. The base models are trained at 256x256 image resolution and roughly match the RN50x4 models on FLOPs and activation counts. The models with `320` in the name are trained at 320x320.
 All models in this series were trained for 13B samples and have ImageNet Zero-Shot top-1 of >= 70.8%. Comparing to ViT-B/16 at 34B SS with zero-shot of 70.2% (68.1% for 13B SS) this suggests the ConvNeXt architecture may be more sample efficient in this range of model scale. More experiments needed to confirm.
 An initial round of benchmarks have been performed on a wider range of datasets, to be viewable at https://github.com/LAION-AI/CLIP_benchmark/blob/main/benchmark/results.ipynb
+As part of exploring increased augmentation + regularization, more analysis is required but early tests indicate the `augreg` models evaluate well over a wider range of resolutions than the non augreg models. Especially the 320x320 LAION-A model, where the augreg disappointed at 320x320 w/ 71.3, but passes the non augreg 71.7 w/ a 72.2 when evaluated at 384x384 (non augreg drops to 71.0 at 384x384).
 # Acknowledgements
 Acknowledging [stability.ai](https://stability.ai/) and the Gauss Centre for Supercomputing e.V. (http://gauss-centre.eu) for funding this part of work by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS Booster at Jülich Supercomputing Centre (JSC).