nielsr HF staff commited on
Commit
4355f59
1 Parent(s): 1d05211

Add code example

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
9
 
10
  Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
11
 
12
- ## Model description
13
-
14
- (to do)
15
-
16
  ## Intended uses & limitations
17
 
18
  You can use the raw model for visual question answering.
19
 
20
  ### How to use
21
 
22
- (to do)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Training data
25
 
 
9
 
10
  Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
11
 
 
 
 
 
12
  ## Intended uses & limitations
13
 
14
  You can use the raw model for visual question answering.
15
 
16
  ### How to use
17
 
18
+ Here is how to use this model in PyTorch:
19
+
20
+ ```python
21
+ from transformers import ViltProcessor, ViltForQuestionAnswering
22
+ import requests
23
+ from PIL import Image
24
+
25
+ # prepare image + question
26
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
27
+ image = Image.open(requests.get(url, stream=True).raw)
28
+ text = "How many cats are there?"
29
+
30
+ processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
31
+ model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
32
+
33
+ # prepare inputs
34
+ encoding = processor(image, text, return_tensors="pt")
35
+
36
+ # forward pass
37
+ outputs = model(**encoding)
38
+ logits = outputs.logits
39
+ idx = logits.argmax(-1).item()
40
+ print("Predicted answer:", model.config.id2label[idx])
41
+ ```
42
 
43
  ## Training data
44