yushihu commited on
Commit
4101b50
1 Parent(s): 44aa161

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -16,12 +16,13 @@ language:
16
  - en
17
 
18
  ---
19
- This is the repo for the paper [PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3](https://arxiv.org/abs/2211.09699)
20
 
21
  We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
22
  For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
23
 
24
- PromptCap can be served as a light-weight visual plug-in for LLM like GPT-3 and ChatGPT. It achieves SOTA performance on COCO captioning (150 CIDEr).
 
25
  When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
26
 
27
  # QuickStart
 
16
  - en
17
 
18
  ---
19
+ This is the repo for the paper [PromptCap: Prompt-Guided Task-Aware Image Captioning](https://arxiv.org/abs/2211.09699)
20
 
21
  We introduce PromptCap, a captioning model that can be controlled by natural language instruction. The instruction may contain a question that the user is interested in.
22
  For example, "what is the boy putting on?". PromptCap also supports generic caption, using the question "what does the image describe?"
23
 
24
+ PromptCap can serve as a light-weight visual plug-in (much faster than BLIP-2) for LLM like GPT-3, ChatGPT, and other foundation models like Segment Anything and DINO.
25
+ It achieves SOTA performance on COCO captioning (150 CIDEr).
26
  When paired with GPT-3, and conditioned on user question, PromptCap get SOTA performance on knowledge-based VQA tasks (60.4% on OK-VQA and 59.6% on A-OKVQA)
27
 
28
  # QuickStart