yushihu commited on
Commit
68336b4
1 Parent(s): 502c77d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -15,4 +15,84 @@ datasets:
15
  language:
16
  - en
17
 
18
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  language:
16
  - en
17
 
18
+ ---
19
+
20
+ # QuickStart
21
+
22
+ ## Installation
23
+ ```
24
+ pip install promptcap
25
+ ```
26
+
27
+ ## Captioning Pipeline
28
+
29
+
30
+ Generate a prompt-guided caption by following:
31
+ ```
32
+ import torch
33
+ from promptcap import PromptCap
34
+
35
+ model = PromptCap("vqascore/promptcap-coco-vqa") # also support OFA checkpoints. e.g. "OFA-Sys/ofa-base"
36
+
37
+ if torch.cuda.is_available():
38
+ model.cuda()
39
+
40
+ prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
41
+ image = "glove_boy.jpeg"
42
+
43
+ print(model.caption(prompt, image))
44
+ ```
45
+
46
+ To try generic captioning, just use "please describe this image according to the given question: what does the image describe?"
47
+
48
+ PromptCap also support taking OCR inputs:
49
+
50
+ ```
51
+ question = "what year was this taken?"
52
+ image = "dvds.jpg"
53
+ ocr = "yip AE Mht juor 02/14/2012"
54
+
55
+ print(model.caption(prompt, image, ocr))
56
+ ```
57
+
58
+
59
+
60
+ ## Visual Question Answering Pipeline
61
+
62
+ Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
63
+ Here we provide a pipeline for combining PromptCap with UnifiedQA.
64
+
65
+ ```
66
+ import torch
67
+ from promptcap import PromptCap_VQA
68
+
69
+ # QA model support all UnifiedQA variants. e.g. "allenai/unifiedqa-v2-t5-large-1251000"
70
+ vqa_model = PromptCap_VQA(promptcap_model="vqascore/promptcap-coco-vqa", vqa_model="allenai/unifiedqa-t5-base")
71
+
72
+ if torch.cuda.is_available():
73
+ vqa_model.cuda()
74
+
75
+ question = "what piece of clothing is this boy putting on?"
76
+ image = "glove_boy.jpeg"
77
+
78
+ print(vqa_model.vqa(question, image))
79
+ ```
80
+
81
+ Similarly, PromptCap supports OCR inputs
82
+
83
+ ```
84
+ question = "what year was this taken?"
85
+ image = "dvds.jpg"
86
+ ocr = "yip AE Mht juor 02/14/2012"
87
+
88
+ print(vqa_model.vqa(prompt, image, ocr=ocr))
89
+ ```
90
+
91
+ Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA
92
+
93
+ ```
94
+ question = "what piece of clothing is this boy putting on?"
95
+ image = "glove_boy.jpeg"
96
+ choices = ["gloves", "socks", "shoes", "coats"]
97
+ print(vqa_model.vqa_multiple_choice(question, image, choices))
98
+ ```