Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers
VictorSanh commited on
Commit
aaddfc9
β€’
1 Parent(s): 54c6ef4

some fixes

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -149,7 +149,7 @@ prompts = [
149
  "In which city is that bridge located?<image>",
150
  ]
151
  images = [[image1, image2], [image3]]
152
- inputs = processor(text=prompts, padding=True, return_tensors="pt")
153
  inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
154
 
155
 
@@ -158,6 +158,7 @@ generated_ids = model.generate(**inputs, max_new_tokens=500)
158
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
159
 
160
  print(generated_texts)
 
161
  ```
162
 
163
  </details>
@@ -205,6 +206,7 @@ generated_ids = model.generate(**inputs, max_new_tokens=500)
205
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
206
 
207
  print(generated_texts)
 
208
  ```
209
 
210
  </details>
 
149
  "In which city is that bridge located?<image>",
150
  ]
151
  images = [[image1, image2], [image3]]
152
+ inputs = processor(text=prompts, images=images, padding=True, return_tensors="pt")
153
  inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
154
 
155
 
 
158
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
159
 
160
  print(generated_texts)
161
+ # ['In this image, we can see the city of New York, and more specifically the Statue of Liberty. In this image, we can see the city of Chicago, and more specifically the skyscrapers of the city.', 'In which city is that bridge located? The Golden Gate Bridge is a suspension bridge spanning the Golden Gate, the one-mile-wide (1.6 km) strait connecting San Francisco Bay and the Pacific Ocean. The structure links the American city of San Francisco, California β€” the northern tip of the San Francisco Peninsula β€” to Marin County, carrying both U.S. Route 101 and California State Route 1 across the strait. The bridge is one of the most internationally recognized symbols of San Francisco, California, and the United States. It has been declared one of the Wonders of the Modern World by the American Society of Civil Engineers.\n\nThe Golden Gate Bridge is a suspension bridge spanning the Golden Gate, the one-mile-wide (1.6 km) strait connecting San Francisco Bay and the Pacific Ocean. The structure links the American city of San Francisco, California β€” the northern tip of the San Francisco Peninsula β€” to Marin County, carrying both U.S. Route 101 and California State Route 1 across the strait. The bridge is one of the most internationally recognized symbols of San Francisco, California, and the United States. It has been declared one of the Wonders of the Modern World by the American Society of Civil Engineers.\n\nThe Golden Gate Bridge is a suspension bridge spanning the Golden Gate, the one-mile-wide (1.6 km) strait connecting San Francisco Bay and the Pacific Ocean. The structure links the American city of San Francisco, California β€” the northern tip of the San Francisco Peninsula β€” to Marin County, carrying both U.S. Route 101 and California State Route 1 across the strait. The bridge is one of the most internationally recognized symbols of San Francisco, California, and the United States. It has been declared one of the Wonders of the Modern World by the American Society of Civil Engineers.\n\nThe Golden Gate Bridge is a suspension bridge spanning the Golden Gate, the one-mile-wide (1.6 km) strait connecting San Francisco Bay and the Pacific Ocean. The structure links the American city of San Francisco, California β€” the northern tip of the San Francisco Peninsula β€” to Marin County, carrying both U.S. Route 101 and California State Route 1 across the strait. The bridge is one of the most internationally recognized symbols of San Francisco, California, and']
162
  ```
163
 
164
  </details>
 
206
  generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
207
 
208
  print(generated_texts)
209
+ # ['User: What do we see in this image? \nAssistant: In this image, we can see the city of New York, and more specifically the Statue of Liberty. \nUser: And how about this image? \nAssistant: In this image we can see buildings, trees, lights, water and sky.']
210
  ```
211
 
212
  </details>