dxli ybelkada HF staff commited on
Commit
f16db55
1 Parent(s): b9797fc

Update README.md (#1)

Browse files

- Update README.md (3af156a76220a22440ec6b0a3e80a91c64a97a0a)


Co-authored-by: Younes Belkada <ybelkada@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +104 -1
README.md CHANGED
@@ -43,4 +43,107 @@ fine-tuned versions on a task that interests you.
43
 
44
  ### How to use
45
 
46
- For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ### How to use
45
 
46
+ For code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example), or refer to the snippets below depending on your usecase:
47
+
48
+ #### Running the model on CPU
49
+
50
+ <details>
51
+ <summary> Click to expand </summary>
52
+
53
+ ```python
54
+ import requests
55
+ from PIL import Image
56
+ from transformers import BlipProcessor, Blip2ForConditionalGeneration
57
+
58
+ processor = BlipProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
59
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl")
60
+
61
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
62
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
63
+
64
+ question = "how many dogs are in the picture?"
65
+ inputs = processor(raw_image, question, return_tensors="pt")
66
+
67
+ out = model.generate(**inputs)
68
+ print(processor.decode(out[0], skip_special_tokens=True))
69
+ ```
70
+ </details>
71
+
72
+ #### Running the model on GPU
73
+
74
+ ##### In full precision
75
+
76
+ <details>
77
+ <summary> Click to expand </summary>
78
+
79
+ ```python
80
+ # pip install accelerate
81
+ import requests
82
+ from PIL import Image
83
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
84
+
85
+ processor = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
86
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", device_map="auto")
87
+
88
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
89
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
90
+
91
+ question = "how many dogs are in the picture?"
92
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda")
93
+
94
+ out = model.generate(**inputs)
95
+ print(processor.decode(out[0], skip_special_tokens=True))
96
+ ```
97
+ </details>
98
+
99
+ ##### In half precision (`float16`)
100
+
101
+ <details>
102
+ <summary> Click to expand </summary>
103
+
104
+ ```python
105
+ # pip install accelerate
106
+ import torch
107
+ import requests
108
+ from PIL import Image
109
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
110
+
111
+ processor = Bli2pProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
112
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", torch_dtype=torch.float16, device_map="auto")
113
+
114
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
115
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
116
+
117
+ question = "how many dogs are in the picture?"
118
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
119
+
120
+ out = model.generate(**inputs)
121
+ print(processor.decode(out[0], skip_special_tokens=True))
122
+ ```
123
+ </details>
124
+
125
+ ##### In 8-bit precision (`int8`)
126
+
127
+ <details>
128
+ <summary> Click to expand </summary>
129
+
130
+ ```python
131
+ # pip install accelerate bitsandbytes
132
+ import torch
133
+ import requests
134
+ from PIL import Image
135
+ from transformers import Blip2Processor, Blip2ForConditionalGeneration
136
+
137
+ processor = Bli2pProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
138
+ model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")
139
+
140
+ img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
141
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
142
+
143
+ question = "how many dogs are in the picture?"
144
+ inputs = processor(raw_image, question, return_tensors="pt").to("cuda", torch.float16)
145
+
146
+ out = model.generate(**inputs)
147
+ print(processor.decode(out[0], skip_special_tokens=True))
148
+ ```
149
+ </details>