gokaygokay commited on
Commit
0e1f234
·
verified ·
1 Parent(s): 165ef7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - google/docci
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: image-text-to-text
9
+ ---
10
+
11
+ Fine tuned version of [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224-jax) model on [google/docci](https://huggingface.co/datasets/google/docci) dataset with middle size captions between 200 and 350 characters.
12
+
13
+ ```
14
+ pip install git+https://github.com/huggingface/transformers
15
+ ```
16
+
17
+ ```python
18
+ from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
19
+ from PIL import Image
20
+ import requests
21
+ import torch
22
+
23
+ model_id = "gokaygokay/paligemma-rich-captions"
24
+
25
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
26
+ image = Image.open(requests.get(url, stream=True).raw)
27
+
28
+ model = PaliGemmaForConditionalGeneration.from_pretrained(model_id).eval()
29
+ processor = AutoProcessor.from_pretrained(model_id)
30
+
31
+ ## prefix
32
+ prompt = "caption en"
33
+ model_inputs = processor(text=prompt, images=image, return_tensors="pt")
34
+ input_len = model_inputs["input_ids"].shape[-1]
35
+
36
+ with torch.inference_mode():
37
+ generation = model.generate(**model_inputs, max_new_tokens=256, do_sample=False)
38
+ generation = generation[0][input_len:]
39
+ decoded = processor.decode(generation, skip_special_tokens=True)
40
+ print(decoded)
41
+ ```