gokaygokay commited on
Commit
26a589f
1 Parent(s): a69d9e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - google/docci
5
+ - google/imageinwords
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ pipeline_tag: image-text-to-text
10
+ tags:
11
+ - art
12
+ ---
13
+
14
+ Fine-tuned version of PaliGemma 224x224 on [google/docci](https://huggingface.co/datasets/google/docci) and [google/imageinwords](https://huggingface.co/datasets/google/imageinwords) datasets.
15
+ ```
16
+ pip install git+https://github.com/huggingface/transformers
17
+ ```
18
+
19
+ ```python
20
+ from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
21
+ from PIL import Image
22
+ import requests
23
+ import torch
24
+
25
+ model_id = "gokaygokay/sd3-long-captioner-v2"
26
+
27
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
28
+ image = Image.open(requests.get(url, stream=True).raw)
29
+
30
+ model = PaliGemmaForConditionalGeneration.from_pretrained(model_id).eval()
31
+ processor = AutoProcessor.from_pretrained(model_id)
32
+
33
+ ## prefix
34
+ prompt = "caption en"
35
+ model_inputs = processor(text=prompt, images=image, return_tensors="pt")
36
+ input_len = model_inputs["input_ids"].shape[-1]
37
+
38
+ with torch.inference_mode():
39
+ generation = model.generate(**model_inputs, repetition_penalty=1.10, max_new_tokens=256, do_sample=False)
40
+ generation = generation[0][input_len:]
41
+ decoded = processor.decode(generation, skip_special_tokens=True)
42
+ print(decoded)
43
+
44
+ ```