Spaces:
Running
on
Zero
Question about BLIP2 on spaces
Hey @hysts
Sorry, this is only slightly related to your space.
I'm trying to build a Gradio app for VQA using Blip2 and I'm getting errors trying to build this on an A10G. Your space also loads Salesforce/blip2-opt-2.7b via Transformers on an a10g.
Any guidance you have would be much appreciated! Thanks!
I'm getting a runtime error code 137, which seems to be an OOM error. Were you able to make this work?
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/main/app.py
Thanks!
Trying an A10G-Large now
Hi
@iamrobotbear
IIRC, the error code 137 means CPU OOM, so I guess using A10G-large will fix your problem as you mentioned.
Woah, didn't expect a response so quickly (or at all), so thank you so much! - you might have looked when I changed my file as I'm trying to use Salesforce/blip2-opt-6.7b
I'm returning my file to the state that I hope to make work, I'm just stuck and a bit over my head rn. Appreciate any help ya can give. Thanks!
(I may also have a gradio error, trying to figure that out as well). I should know momentarily, the build is just about ready to fail.
@iamrobotbear
I think you need to remove these in the case you are using load_in_8bit=True
.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L15
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L19
Also, you are using the same component multiple times here, but I think you can't reuse components that way.
https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L57-L58
Also, this is a minor point, but gr.inputs.
and gr.outputs
are deprecated, and you can just use gr.Image
etc.
@hysts -- Ok, I think I'm still stuck and, frankly do not know where to go next. As I said I'm a bit over my head.
The origin of my app.py file (seen here: https://gist.github.com/brianjking/e67bb7473d29e968aa23a6f791484298) is based on this Jupyter notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BLIP-2/Chat_with_BLIP_2_%5Bint8_bitsandbytes%5D.ipynb.
Here's my current error when I try to build the above linked Gist on an A10G-Large
Space failed to start. Exit code: 1. Reason: ython3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//iamrobotbear-blip-vqa-gradio.hf.space'), PosixPath('https')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lib/pyenv/hooks'), PosixPath('/etc/pyenv.d'), PosixPath('/usr/local/etc/pyenv.d')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('443'), PosixPath('//172.20.0.1')} warn(msg) Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|βββββ | 1/2 [00:05<00:05, 5.32s/it] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.73s/it] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.97s/it] /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/outputs.py:22: UserWarning: Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components warnings.warn( Traceback (most recent call last): File "app.py", line 47, in iface = gr.Interface( File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 444, in init ) = self.render_input_column() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 511, in render_input_column component.render() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 85, in render raise DuplicateBlockError( gradio.exceptions.DuplicateBlockError: A block with id: 1 has already been rendered in the current Blocks.
Essentially, I want to be able to do the following:
β’ User can load an image or ideally image(s) from a directory or a series of images
β’ User can use Image captioning, prompted image captioning, VQA, or chat based prompting.
β’ Ideally, I'll be able to take that and generate Image Text Matching scores using Blip2 as I do here: https://huggingface.co/spaces/iamrobotbear/test.
The end result should be able to be some sort of way to test images against a series of statements to see if they have a match with a percentage confidence score of the match.
Thanks for your help, I really appreciate it. I think I'm soooo close.
@iamrobotbear
I think you can fix the current error by applying the following patch to your latest code.
diff --git a/app.py b/app.py
index d34ec0a..92edbcb 100644
--- a/app.py
+++ b/app.py
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
# Prepare image input
image_input = Image.fromarray(image).convert('RGB')
inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
-
+
# Image Captioning
generated_ids = model.generate(**inputs, max_new_tokens=20)
image_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
@@ -23,13 +24,13 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
inputs = processor(image_input, text=prompted_caption_text, return_tensors="pt").to(device, torch.float16)
generated_ids = model.generate(**inputs, max_new_tokens=20)
prompted_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
# Visual Question Answering (VQA)
prompt = f"Question: {vqa_question} Answer:"
inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
generated_ids = model.generate(**inputs, max_new_tokens=10)
vqa_answer = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
# Chat-based Prompting
prompt = chat_context + " Answer:"
inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
@@ -40,14 +41,19 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
# Define Gradio input and output components
image_input = gr.Image(type="numpy")
-text_input = gr.Text()
-output_text = gr.outputs.Textbox()
+prompted_caption_input = gr.Textbox()
+vqa_question_input = gr.Textbox()
+chat_context = gr.Textbox()
+image_caption_result = gr.Textbox()
+prompted_caption_result = gr.Textbox()
+vqa_answer = gr.Textbox()
+chat_response = gr.Textbox()
# Create Gradio interface
iface = gr.Interface(
fn=blip2_interface,
- inputs=[image_input, text_input, text_input, text_input],
- outputs=[output_text, output_text, output_text, output_text],
+ inputs=[image_input, prompted_caption_input, vqa_question_input, chat_context],
+ outputs=[image_caption_result, prompted_caption_result, vqa_answer, chat_response],
title="BLIP-2 Image Captioning and VQA",
description="Interact with the BLIP-2 model for image captioning, prompted image captioning, visual question answering, and chat-based prompting.",
)
Ooh, thank you @hysts This ALMOST works, for the first time, it actually builds!
I'm getting 3 input boxes and then 4 output for some reason; when I submit questions/prompts I get errors in the Gradio output boxes, but nothing in the logs. I imagine it's likely due to somehow having only 3 inputs and 4 outputs...?
In what you're currently running on your space (in this repo):
β’ Is this A10G small or Large?
β’ Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?
Thank you so much, really appreciate you!
Looks like I might have another error too, possibly related?
9gsj8 2023-04-02T14:09:12.901Z Traceback (most recent call last):
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:09:12.902Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:09:12.902Z result = await self.call_function(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:09:12.902Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:09:12.902Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:09:12.902Z return await future
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:09:12.902Z result = context.run(func, *args)
9gsj8 2023-04-02T14:09:12.902Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:09:12.902Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:09:12.902Z NameError: name 'device' is not defined
9gsj8 2023-04-02T14:14:39.869Z Traceback (most recent call last):
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:14:39.869Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:14:39.869Z result = await self.call_function(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:14:39.869Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:14:39.869Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:14:39.869Z return await future
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:14:39.869Z result = context.run(func, *args)
9gsj8 2023-04-02T14:14:39.869Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:14:39.869Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:14:39.869Z NameError: name 'device' is not defined
@iamrobotbear
It seems you forgot to add this line in my comment:
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
"Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Your function returned 4 values and shows 4 outputs, so I just suggested a change to make your code work. If it's not what you want to do, maybe you can change that part.
@iamrobotbear
Sorry, forgot to answer these:
β’ Is this A10G small or Large?
β’ Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?
I've tested your code on my GCP environment that has T4 and 30GB CPU RAMs, but it doesn't seem to use much CPU RAM, I guess you can run your code on T4 small. The model I tested was Salesforce/blip2-opt-2.7b
.
Regarding the second question, I'm asking about YOUR space, in this repo.
Is it using FlanT5 or blip2-opt-2.7b?
@iamrobotbear Ah, sorry, I misunderstood. This Space is using FLAN T5 XXL and running on A10G small.