File size: 2,284 Bytes
b027fb0 502c77d b4cf00d 5d845fe 502c77d 68336b4 7cef707 68336b4 f1e34bc 68336b4 cb72bec 68336b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
---
license: openrail
inference: false
pipeline_tag: image-to-text
tags:
- image-to-text
- visual-question-answering
- image-captioning
datasets:
- coco
- textvqa
- VQAv2
- OK-VQA
- A-OKVQA
language:
- en
---
# QuickStart
## Installation
```
pip install promptcap
```
## Captioning Pipeline
Generate a prompt-guided caption by following:
```python
import torch
from promptcap import PromptCap
model = PromptCap("vqascore/promptcap-coco-vqa") # also support OFA checkpoints. e.g. "OFA-Sys/ofa-base"
if torch.cuda.is_available():
model.cuda()
prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"
print(model.caption(prompt, image))
```
To try generic captioning, just use "please describe this image according to the given question: what does the image describe?"
PromptCap also support taking OCR inputs:
```
prompt = "please describe this image according to the given question: what year was this taken?"
image = "dvds.jpg"
ocr = "yip AE Mht juor 02/14/2012"
print(model.caption(prompt, image, ocr))
```
## Visual Question Answering Pipeline
Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
Here we provide a pipeline for combining PromptCap with UnifiedQA.
```
import torch
from promptcap import PromptCap_VQA
# QA model support all UnifiedQA variants. e.g. "allenai/unifiedqa-v2-t5-large-1251000"
vqa_model = PromptCap_VQA(promptcap_model="vqascore/promptcap-coco-vqa", qa_model="allenai/unifiedqa-t5-base")
if torch.cuda.is_available():
vqa_model.cuda()
question = "what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"
print(vqa_model.vqa(question, image))
```
Similarly, PromptCap supports OCR inputs
```
question = "what year was this taken?"
image = "dvds.jpg"
ocr = "yip AE Mht juor 02/14/2012"
print(vqa_model.vqa(prompt, image, ocr=ocr))
```
Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA
```
question = "what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"
choices = ["gloves", "socks", "shoes", "coats"]
print(vqa_model.vqa_multiple_choice(question, image, choices))
```
|