OpenCLIP
PyTorch
clip
Edit model card

A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-2B. Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. This model was trained on 2B images that were filtered from a pool of 12.8B uncurated image-text pairs (12.8B image-text pairs from CommonPool-12.8B).

This model has been converted to PyTorch from the original JAX checkpoints from Axlearn (https://github.com/apple/axlearn). These weights are directly usable in OpenCLIP (image + text).

Model Details

  • Model Type: Contrastive Image-Text, Zero-Shot Image Classification.
  • Dataset: DFN-2b
  • Papers:
  • Examples Seen: 39B

Model Metrics

Eval Dataset Metric
ImageNet 1k 0.8219
Caltech-101 0.9500
CIFAR-10 0.9864
CIFAR-100 0.8934
CLEVR Counts 0.3403
CLEVR Distance 0.2321
Country211 0.3198
Describable Textures 0.6681
EuroSAT 0.6819
FGVC Aircraft 0.4829
Food-101 0.9498
GTSRB 0.6329
ImageNet Sketch 0.7043
ImageNet v2 0.7570
ImageNet-A 0.6745
ImageNet-O 0.3605
ImageNet-R 0.9184
KITTI Vehicle Distance 0.2391
MNIST 0.8745
ObjectNet 0.7477
Oxford Flowers-102 0.8784
Oxford-IIIT Pet 0.9611
Pascal VOC 2007 0.8472
PatchCamelyon 0.6418
Rendered SST2 0.5815
RESISC45 0.7300
Stanford Cars 0.9465
STL-10 0.9889
SUN397 0.7594
SVHN 0.6573
Flickr 0.8467
MSCOCO 0.5957
WinoGAViL 0.5551
iWildCam 0.1857
Camelyon17 0.6540
FMoW 0.1824
Dollar Street 0.6822
GeoDE 0.9253
Average 0.68039

Citation

@article{fang2023data,
  title={Data Filtering Networks},
  author={Fang, Alex and Jose, Albin Madappally and Jain, Amit and Schmidt, Ludwig and Toshev, Alexander and Shankar, Vaishaal},
  journal={arXiv preprint arXiv:2309.17425},
  year={2023}
}
Downloads last month
1,018
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collection including apple/DFN2B-CLIP-ViT-L-14-39B