innocent-charles commited on
Commit
d6953ca
1 Parent(s): 6c342d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -108
README.md CHANGED
@@ -14,116 +14,11 @@ language:
14
  - pt
15
  pipeline_tag: zero-shot-image-classification
16
  tags:
 
17
  - clip
18
- - vision transformer
19
- - pytorch
20
- - text-image embedding
21
- - AvilaBSE
22
- - sartify
23
  - OpenCLIP
 
 
24
  license: apache-2.0
25
  ---
26
 
27
- A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-5B.
28
- Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data.
29
- This model was trained on 5B images that were filtered from a pool of 43B uncurated image-text pairs
30
- (12.8B image-text pairs from CommonPool-12.8B + 30B additional public image-text pairs).
31
-
32
- This model has been converted to PyTorch from the original JAX checkpoints from Axlearn (https://github.com/apple/axlearn).
33
- These weights are directly usable in OpenCLIP (image + text).
34
-
35
-
36
- ## Model Details
37
-
38
- - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification.
39
- - **Dataset:** DFN-5b
40
- - **Papers:**
41
- - Data Filtering Networks: https://arxiv.org/abs/2309.17425
42
- - **Samples Seen:** 39B (224 x 224) + 5B (384 x 384)
43
- ## Model Metrics
44
- | dataset | metric |
45
- |:-----------------------|---------:|
46
- | ImageNet 1k | 0.84218 |
47
- | Caltech-101 | 0.954479 |
48
- | CIFAR-10 | 0.9879 |
49
- | CIFAR-100 | 0.9041 |
50
- | CLEVR Counts | 0.362467 |
51
- | CLEVR Distance | 0.206067 |
52
- | Country211 | 0.37673 |
53
- | Describable Textures | 0.71383 |
54
- | EuroSAT | 0.608333 |
55
- | FGVC Aircraft | 0.719938 |
56
- | Food-101 | 0.963129 |
57
- | GTSRB | 0.679018 |
58
- | ImageNet Sketch | 0.73338 |
59
- | ImageNet v2 | 0.7837 |
60
- | ImageNet-A | 0.7992 |
61
- | ImageNet-O | 0.3785 |
62
- | ImageNet-R | 0.937633 |
63
- | KITTI Vehicle Distance | 0.38256 |
64
- | MNIST | 0.8372 |
65
- | ObjectNet <sup>1</sup> | 0.796867 |
66
- | Oxford Flowers-102 | 0.896834 |
67
- | Oxford-IIIT Pet | 0.966841 |
68
- | Pascal VOC 2007 | 0.826255 |
69
- | PatchCamelyon | 0.695953 |
70
- | Rendered SST2 | 0.566722 |
71
- | RESISC45 | 0.755079 |
72
- | Stanford Cars | 0.959955 |
73
- | STL-10 | 0.991125 |
74
- | SUN397 | 0.772799 |
75
- | SVHN | 0.671251 |
76
- | Flickr | 0.8808 |
77
- | MSCOCO | 0.636889 |
78
- | WinoGAViL | 0.571813 |
79
- | iWildCam | 0.224911 |
80
- | Camelyon17 | 0.711536 |
81
- | FMoW | 0.209024 |
82
- | Dollar Street | 0.71729 |
83
- | GeoDE | 0.935699 |
84
- | **Average** | **0.709421** |
85
-
86
-
87
- [1]: Center-crop pre-processing used for ObjectNet (squashing results in lower accuracy of 0.737)
88
- ## Model Usage
89
- ### With OpenCLIP
90
- ```
91
- import torch
92
- import torch.nn.functional as F
93
- from urllib.request import urlopen
94
- from PIL import Image
95
- from open_clip import create_model_from_pretrained, get_tokenizer
96
-
97
- model, preprocess = create_model_from_pretrained('hf-hub:sartifyllc/avilama')
98
- tokenizer = get_tokenizer(sartifyllc/avilama)
99
-
100
- image = Image.open(urlopen(
101
- 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
102
- ))
103
- image = preprocess(image).unsqueeze(0)
104
-
105
- labels_list = ["a dog", "a cat", "a donut", "a beignet"]
106
- text = tokenizer(labels_list, context_length=model.context_length)
107
-
108
- with torch.no_grad(), torch.cuda.amp.autocast():
109
- image_features = model.encode_image(image)
110
- text_features = model.encode_text(text)
111
- image_features = F.normalize(image_features, dim=-1)
112
- text_features = F.normalize(text_features, dim=-1)
113
-
114
- text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)
115
-
116
- zipped_list = list(zip(labels_list, [round(p.item(), 3) for p in text_probs[0]]))
117
- print("Label probabilities: ", zipped_list)
118
- ```
119
-
120
- ## Citation
121
- ```bibtex
122
- @article{fang2023data,
123
- title={Data Filtering Networks},
124
- author={Fang, Alex and Jose, Albin Madappally and Jain, Amit and Schmidt, Ludwig and Toshev, Alexander and Shankar, Vaishaal},
125
- journal={arXiv preprint arXiv:2309.17425},
126
- year={2023}
127
- }
128
-
129
- ```
 
14
  - pt
15
  pipeline_tag: zero-shot-image-classification
16
  tags:
17
+ - zero-shot-image-classification
18
  - clip
 
 
 
 
 
19
  - OpenCLIP
20
+ - pytorch
21
+ - safetensors
22
  license: apache-2.0
23
  ---
24