michelecafagna26
/

clipcap-base-captioning-ft-hl-scenes

image-captioning

Inference Endpoints

Model card Files Files and versions Community

michelecafagna26 commited on Jul 24, 2023

Commit

ff093cf

•

1 Parent(s): f34fd3d

Upload 2 files

Files changed (2) hide show

README.md +101 -0
pytorch_model.pt +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+tags:
+- image-captioning
+languages:
+- en
+datasets:
+- michelecafagna26/hl
+language:
+- en
+metrics:
+- sacrebleu
+- rouge
+library_name: transformers
 ---
+## ClipCap fine-tuned for Scenes Image Captioning
+[ClipCap](https://arxiv.org/abs/2111.09734) base trained on the [HL Dataset](https://huggingface.co/datasets/michelecafagna26/hl) for **high-level scene descriptions generation**
+## Model fine-tuning 🏋️‍
+We fine-tune LM + Mapping Network starting grom the model pretrained on COCO
+- Trained for a 9 epochs
+- lr:  5e−5
+- Adam optimizer
+- half-precision (fp16)
+## Test set metrics 🧾
+    | Cider   | SacreBLEU  | Rouge-L|
+    |---------|------------|--------|
+    | 145.93  |   36.73    |  42.83 |
+## Demo
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1xcaJOxaAp8TRd8a6x1XnAptVjHQRv3Zj?usp=sharing)
+## Installation
+```bash
+pip install git+https://github.com/michelecafagna26/CLIPCap.git
+```
+## Download the model
+```bash
+git lfs install # if not installed
+git clone https://huggingface.co/michelecafagna26/clipcap-base-captioning-ft-hl-scenes
+```
+## Model in Action 🚀
+```python
+from clipcap import ClipCaptionModel
+import torch
+from transformers import (
+    GPT2Tokenizer,
+    GPT2LMHeadModel,
+)
+import torch
+import clip
+import requests
+from PIL import Image
+model_path = "clipcap-base-captioning-ft-hl-scenes/pytorch_model.pt" # change accordingly
+# load clip
+device = "cuda" if torch.cuda.is_available() else "cpu"
+clip_model, preprocess = clip.load("ViT-B/32", device=device, jit=False)
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+prefix_length = 10
+# load ClipCap
+model = ClipCaptionModel(prefix_length, tokenizer=tokenizer)
+model.from_pretrained(model_path)
+model = model.eval()
+model = model.to(device)
+# load the image
+img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg'
+raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
+# extract the prefix
+image = preprocess(raw_image).unsqueeze(0).to(device)
+with torch.no_grad():
+    prefix = clip_model.encode_image(image).to(
+        device, dtype=torch.float32
+    )
+    prefix_embed = model.clip_project(prefix).reshape(1, prefix_length, -1)
+# generate the caption
+model.generate_beam(embed=prefix_embed)[0]
+# >> ""
+```
+## BibTex and citation info
+```BibTeX
+```

pytorch_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b25da1e3aca001bc0a922e1f63f2069d3620198fff3030d3656ca974cbe9b2cd
+size 636274141