ybelkada HF staff commited on
Commit
0d78262
1 Parent(s): ce7a82c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -7
README.md CHANGED
@@ -68,19 +68,104 @@ processor.push_to_hub("USERNAME/MODEL_NAME")
68
 
69
  ## Running the model
70
 
71
- TODO
72
 
73
- # Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- TODO
 
76
 
77
- # Introduction to UL2
 
 
 
 
 
 
78
 
79
- TODO
 
 
 
 
 
 
 
 
80
 
81
- # Fine-tuning
 
82
 
83
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  # Contribution
86
 
 
68
 
69
  ## Running the model
70
 
71
+ ### In full precision, on CPU:
72
 
73
+ You can run the model in full precision on CPU:
74
+ ```python
75
+ import requests
76
+ from PIL import Image
77
+ from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
78
+
79
+ url = "https://www.ilankelman.org/stopsigns/australia.jpg"
80
+ image = Image.open(requests.get(url, stream=True).raw)
81
+
82
+ model = Pix2StructForConditionalGeneration.from_pretrained("ybelkada/pix2struct-textcaps-base")
83
+ processor = Pix2StructProcessor.from_pretrained("ybelkada/pix2struct-textcaps-base")
84
+
85
+ # image only
86
+ inputs = processor(images=image, return_tensors="pt")
87
+
88
+ predictions = model.generate(**inputs)
89
+ print(processor.decode(predictions[0], skip_special_tokens=True))
90
+ >>> A stop sign is on a street corner.
91
+ ```
92
+
93
+ ### In full precision, on GPU:
94
+
95
+ You can run the model in full precision on CPU:
96
+ ```python
97
+ import requests
98
+ from PIL import Image
99
+ from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
100
+
101
+ url = "https://www.ilankelman.org/stopsigns/australia.jpg"
102
+ image = Image.open(requests.get(url, stream=True).raw)
103
 
104
+ model = Pix2StructForConditionalGeneration.from_pretrained("ybelkada/pix2struct-textcaps-base").to("cuda")
105
+ processor = Pix2StructProcessor.from_pretrained("ybelkada/pix2struct-textcaps-base")
106
 
107
+ # image only
108
+ inputs = processor(images=image, return_tensors="pt").to("cuda")
109
+
110
+ predictions = model.generate(**inputs)
111
+ print(processor.decode(predictions[0], skip_special_tokens=True))
112
+ >>> A stop sign is on a street corner.
113
+ ```
114
 
115
+ ### In half precision, on GPU:
116
+
117
+ You can run the model in full precision on CPU:
118
+ ```python
119
+ import requests
120
+ import torch
121
+
122
+ from PIL import Image
123
+ from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
124
 
125
+ url = "https://www.ilankelman.org/stopsigns/australia.jpg"
126
+ image = Image.open(requests.get(url, stream=True).raw)
127
 
128
+ model = Pix2StructForConditionalGeneration.from_pretrained("ybelkada/pix2struct-textcaps-base", torch_dtype=torch.bfloat16).to("cuda")
129
+ processor = Pix2StructProcessor.from_pretrained("ybelkada/pix2struct-textcaps-base")
130
+
131
+ # image only
132
+ inputs = processor(images=image, return_tensors="pt").to("cuda", torch.bfloat16)
133
+
134
+ predictions = model.generate(**inputs)
135
+ print(processor.decode(predictions[0], skip_special_tokens=True))
136
+ >>> A stop sign is on a street corner.
137
+ ```
138
+
139
+ ### Use different sequence length
140
+
141
+ This model has been trained on a sequence length of `2048`. You can try to reduce the sequence length for a more memory efficient inference but you may observe some performance degradation for small sequence length (<512). Just pass `max_patches` when calling the processor:
142
+ ```python
143
+ inputs = processor(images=image, return_tensors="pt", max_patches=512)
144
+ ```
145
+
146
+ ### Conditional generation
147
+
148
+ You can also pre-pend some input text to perform conditional generation:
149
+
150
+ ```python
151
+ import requests
152
+ from PIL import Image
153
+ from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
154
+
155
+ url = "https://www.ilankelman.org/stopsigns/australia.jpg"
156
+ image = Image.open(requests.get(url, stream=True).raw)
157
+ text = "A picture of"
158
+
159
+ model = Pix2StructForConditionalGeneration.from_pretrained("ybelkada/pix2struct-textcaps-base")
160
+ processor = Pix2StructProcessor.from_pretrained("ybelkada/pix2struct-textcaps-base")
161
+
162
+ # image only
163
+ inputs = processor(images=image, text=text, return_tensors="pt")
164
+
165
+ predictions = model.generate(**inputs)
166
+ print(processor.decode(predictions[0], skip_special_tokens=True))
167
+ >>> A picture of a stop sign that says yes.
168
+ ```
169
 
170
  # Contribution
171