Add model

Browse files

Files changed (9) hide show

README.md +68 -0
added_tokens.json +7 -0
open_clip_config.json +42 -0
open_clip_model.safetensors +3 -0
open_clip_pytorch_model.bin +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+tags:
+- clip
+library_name: open_clip
+pipeline_tag: zero-shot-image-classification
+license: apache-2.0
+datasets:
+- mlfoundations/datacomp_1b
+---
+# Model card for ViT-H-14-CLIPA-datacomp1B
+A CLIPA-v2 model...
+## Model Details
+- **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification.
+- **Original:** https://github.com/UCSC-VLAA/CLIPA
+- **Dataset:** mlfoundations/datacomp_1b
+- **Papers:**
+  - CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy: https://arxiv.org/abs/2306.15658
+  - An Inverse Scaling Law for CLIP Training: https://arxiv.org/abs/2305.07017
+## Model Usage
+### With OpenCLIP
+```
+import torch
+import torch.nn.functional as F
+from urllib.request import urlopen
+from PIL import Image
+from open_clip import create_model_from_pretrained, get_tokenizer
+model, preprocess = create_model_from_pretrained('hf-hub:ViT-H-14-CLIPA')
+tokenizer = get_tokenizer('hf-hub:ViT-H-14-CLIPA')
+image = Image.open(urlopen(
+    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
+))
+image = preprocess(image).unsqueeze(0)
+text = tokenizer(["a diagram", "a dog", "a cat", "a beignet"], context_length=model.context_length)
+with torch.no_grad(), torch.cuda.amp.autocast():
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    image_features = F.normalize(image_features, dim=-1)
+    text_features = F.normalize(text_features, dim=-1)
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+print("Label probs:", text_probs)  # prints: [[0., 0., 0., 1.0]]
+```
+## Citation
+```bibtex
+@article{li2023clipav2,
+      title={CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a $10,000 Budget; An Extra $4,000 Unlocks 81.8% Accuracy},
+      author={Xianhang Li and Zeyu Wang and Cihang Xie},
+      journal={arXiv preprint arXiv:2306.15658},
+      year={2023},
+}
+```
+```bibtex
+@inproceedings{li2023clipa,
+      title={An Inverse Scaling Law for CLIP Training},
+      author={Xianhang Li and Zeyu Wang and Cihang Xie},
+      booktitle={NeurIPS},
+      year={2023},
+}
+```

added_tokens.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "[CLS]": 101,
+  "[MASK]": 103,
+  "[PAD]": 0,
+  "[SEP]": 102,
+  "[UNK]": 100
+}

open_clip_config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "model_cfg": {
+    "embed_dim": 1024,
+    "vision_cfg": {
+      "image_size": 224,
+      "layers": 32,
+      "width": 1280,
+      "head_width": 80,
+      "patch_size": 14,
+      "no_ln_pre": true,
+      "pool_type": "avg",
+      "final_ln_after_pool": true
+    },
+    "text_cfg": {
+      "context_length": 32,
+      "vocab_size": 32000,
+      "hf_tokenizer_name": "bert-base-uncased",
+      "tokenizer_kwargs": {
+        "strip_sep_token": true
+      },
+      "width": 1024,
+      "heads": 16,
+      "layers": 24,
+      "pool_type": "last",
+      "no_causal_mask": true
+    }
+  },
+  "preprocess_cfg": {
+    "mean": [
+      0.485,
+      0.456,
+      0.406
+    ],
+    "std": [
+      0.229,
+      0.224,
+      0.225
+    ],
+    "interpolation": "bilinear",
+    "resize_mode": "squash"
+  }
+}

open_clip_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05fdb3e9071caf865704ba794d4d6f770f0bffb07d82a9a1acff8f381c4d4b27
+size 3873019860

open_clip_pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0af7f7d05e15ff450843e049554b27620d7c036dd3f9a268600ffd2c6bf6c6df
+size 3873205914

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff