mair-lab
/

vismin-clip-vit-large

Model card Files Files and versions Community

rabiulawal commited on Aug 9, 2024

Commit

e7c9abf

verified ·

1 Parent(s): be0b9d7

Added model usage code snippet

Browse files

Files changed (1) hide show

README.md +69 -3

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+---
+**Model Details**
+VisMin-CLIP is a fine-tuned version of the pretrained CLIP model, designed to enhance fine-grained and compositional abilities beyond the base model. Fine-tuning was conducted using the [OpenCLIP](https://github.com/mlfoundations/open_clip) library, an open-source implementation of OpenAI’s CLIP.
+**Model Summary**
+- Model Date: July 2024
+- Model type: Vision-language Foundation Model (image+text)
+- Parent Model: [openai/clip-vit-large-patch14](openai/clip-vit-large-patch14)
+**Usage**
+Similar to any OpenCLIP model can easily be loaded from the checkpoint:
+```python
+import open_clip
+model_cls_name = "ViT-L-14"
+checkpoint_path = "path/to/checkpoint"
+model, _, preprocess = open_clip.create_model_and_transforms(
+    model_name=model_cls_name, pretrained=checkpoint_path, device=device
+)
+tokenizer = open_clip.get_tokenizer(model_cls_name)
+model = model.to(device).eval()
+```
+Once loaded, you can encode the image and text to do zero-shot image classification:
+```python
+import torch
+from PIL import Image
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+image = preprocess(image).unsqueeze(0)
+text = tokenizer(["a diagram", "a dog", "a cat"])
+with torch.no_grad(), torch.cuda.amp.autocast():
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    image_features /= image_features.norm(dim=-1, keepdim=True)
+    text_features /= text_features.norm(dim=-1, keepdim=True)
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+print("Label probs:", text_probs)
+```
+**Bibtex**
+If you use VisMin-CLIP in your work, please cite it as follows:
+```
+ @article{vismin2024,
+    title={VisMin: Visual Minimal-Change Understanding},
+    author={Awal, Rabiul and Ahmadi, Saba and Zhang, Le and Agrawal, Aishwarya},
+    year={2024}
+}
+```