Question about BLIP2 on spaces

#2
by iamrobotbear - opened

Hey @hysts

Sorry, this is only slightly related to your space.

I'm trying to build a Gradio app for VQA using Blip2 and I'm getting errors trying to build this on an A10G. Your space also loads Salesforce/blip2-opt-2.7b via Transformers on an a10g.

Any guidance you have would be much appreciated! Thanks!

I'm getting a runtime error code 137, which seems to be an OOM error. Were you able to make this work?

https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/main/app.py

Thanks!

Trying an A10G-Large now

Hi @iamrobotbear
IIRC, the error code 137 means CPU OOM, so I guess using A10G-large will fix your problem as you mentioned.

Woah, didn't expect a response so quickly (or at all), so thank you so much! - you might have looked when I changed my file as I'm trying to use Salesforce/blip2-opt-6.7b

I'm returning my file to the state that I hope to make work, I'm just stuck and a bit over my head rn. Appreciate any help ya can give. Thanks!

@hysts

(I may also have a gradio error, trying to figure that out as well). I should know momentarily, the build is just about ready to fail.

Also, this is a minor point, but gr.inputs. and gr.outputs are deprecated, and you can just use gr.Image etc.

https://huggingface.co/spaces/iamrobotbear/blip-vqa-gradio/blob/58bb7b343390800e9778c3a1387db3edcf9d071d/app.py#L50-L52

@hysts -- Ok, I think I'm still stuck and, frankly do not know where to go next. As I said I'm a bit over my head.

The origin of my app.py file (seen here: https://gist.github.com/brianjking/e67bb7473d29e968aa23a6f791484298) is based on this Jupyter notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BLIP-2/Chat_with_BLIP_2_%5Bint8_bitsandbytes%5D.ipynb.

Here's my current error when I try to build the above linked Gist on an A10G-Large

Space failed to start. Exit code: 1. Reason: ython3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//iamrobotbear-blip-vqa-gradio.hf.space'), PosixPath('https')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/lib/pyenv/hooks'), PosixPath('/etc/pyenv.d'), PosixPath('/usr/local/etc/pyenv.d')} warn(msg) /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('tcp'), PosixPath('443'), PosixPath('//172.20.0.1')} warn(msg) Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/2 [00:05<00:05, 5.32s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:07<00:00, 3.73s/it] Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:07<00:00, 3.97s/it] /home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/outputs.py:22: UserWarning: Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components warnings.warn( Traceback (most recent call last): File "app.py", line 47, in iface = gr.Interface( File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 444, in init ) = self.render_input_column() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/interface.py", line 511, in render_input_column component.render() File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 85, in render raise DuplicateBlockError( gradio.exceptions.DuplicateBlockError: A block with id: 1 has already been rendered in the current Blocks.

Essentially, I want to be able to do the following:

β€’ User can load an image or ideally image(s) from a directory or a series of images
β€’ User can use Image captioning, prompted image captioning, VQA, or chat based prompting.
β€’ Ideally, I'll be able to take that and generate Image Text Matching scores using Blip2 as I do here: https://huggingface.co/spaces/iamrobotbear/test.

The end result should be able to be some sort of way to test images against a series of statements to see if they have a match with a percentage confidence score of the match.

Thanks for your help, I really appreciate it. I think I'm soooo close.

@iamrobotbear
I think you can fix the current error by applying the following patch to your latest code.

diff --git a/app.py b/app.py
index d34ec0a..92edbcb 100644
--- a/app.py
+++ b/app.py
@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
 model = Blip2ForConditionalGeneration.from_pretrained(
     "Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
 )
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

 def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
     # Prepare image input
     image_input = Image.fromarray(image).convert('RGB')
     inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
-
+
     # Image Captioning
     generated_ids = model.generate(**inputs, max_new_tokens=20)
     image_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
@@ -23,13 +24,13 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):
     inputs = processor(image_input, text=prompted_caption_text, return_tensors="pt").to(device, torch.float16)
     generated_ids = model.generate(**inputs, max_new_tokens=20)
     prompted_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
     # Visual Question Answering (VQA)
     prompt = f"Question: {vqa_question} Answer:"
     inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
     generated_ids = model.generate(**inputs, max_new_tokens=10)
     vqa_answer = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
-
+
     # Chat-based Prompting
     prompt = chat_context + " Answer:"
     inputs = processor(image_input, text=prompt, return_tensors="pt").to(device, torch.float16)
@@ -40,14 +41,19 @@ def blip2_interface(image, prompted_caption_text, vqa_question, chat_context):

 # Define Gradio input and output components
 image_input = gr.Image(type="numpy")
-text_input = gr.Text()
-output_text = gr.outputs.Textbox()
+prompted_caption_input = gr.Textbox()
+vqa_question_input = gr.Textbox()
+chat_context = gr.Textbox()
+image_caption_result = gr.Textbox()
+prompted_caption_result = gr.Textbox()
+vqa_answer = gr.Textbox()
+chat_response = gr.Textbox()

 # Create Gradio interface
 iface = gr.Interface(
     fn=blip2_interface,
-    inputs=[image_input, text_input, text_input, text_input],
-    outputs=[output_text, output_text, output_text, output_text],
+    inputs=[image_input, prompted_caption_input, vqa_question_input, chat_context],
+    outputs=[image_caption_result, prompted_caption_result, vqa_answer, chat_response],
     title="BLIP-2 Image Captioning and VQA",
     description="Interact with the BLIP-2 model for image captioning, prompted image captioning, visual question answering, and chat-based prompting.",
 )

Ooh, thank you @hysts This ALMOST works, for the first time, it actually builds!

I'm getting 3 input boxes and then 4 output for some reason; when I submit questions/prompts I get errors in the Gradio output boxes, but nothing in the logs. I imagine it's likely due to somehow having only 3 inputs and 4 outputs...?

CleanShot 2023-04-02 at 07.10.56@2x.png

In what you're currently running on your space (in this repo):

β€’ Is this A10G small or Large?
β€’ Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?

Thank you so much, really appreciate you!

Looks like I might have another error too, possibly related?

9gsj8 2023-04-02T14:09:12.901Z Traceback (most recent call last):
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:09:12.902Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:09:12.902Z result = await self.call_function(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:09:12.902Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:09:12.902Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:09:12.902Z return await future
9gsj8 2023-04-02T14:09:12.902Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:09:12.902Z result = context.run(func, *args)
9gsj8 2023-04-02T14:09:12.902Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:09:12.902Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:09:12.902Z NameError: name 'device' is not defined
9gsj8 2023-04-02T14:14:39.869Z Traceback (most recent call last):
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict
9gsj8 2023-04-02T14:14:39.869Z output = await app.get_blocks().process_api(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api
9gsj8 2023-04-02T14:14:39.869Z result = await self.call_function(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function
9gsj8 2023-04-02T14:14:39.869Z prediction = await anyio.to_thread.run_sync(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
9gsj8 2023-04-02T14:14:39.869Z return await get_asynclib().run_sync_in_worker_thread(
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
9gsj8 2023-04-02T14:14:39.869Z return await future
9gsj8 2023-04-02T14:14:39.869Z File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
9gsj8 2023-04-02T14:14:39.869Z result = context.run(func, *args)
9gsj8 2023-04-02T14:14:39.869Z File "app.py", line 16, in blip2_interface
9gsj8 2023-04-02T14:14:39.869Z inputs = processor(image_input, return_tensors="pt").to(device, torch.float16)
9gsj8 2023-04-02T14:14:39.869Z NameError: name 'device' is not defined

@iamrobotbear
It seems you forgot to add this line in my comment:

@@ -9,12 +9,13 @@ processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
 model = Blip2ForConditionalGeneration.from_pretrained(
     "Salesforce/blip2-opt-2.7b", load_in_8bit=True, device_map='auto'
 )
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Your function returned 4 values and shows 4 outputs, so I just suggested a change to make your code work. If it's not what you want to do, maybe you can change that part.

@iamrobotbear
Sorry, forgot to answer these:

β€’ Is this A10G small or Large?
β€’ Is it currently using BLIP2 / Salesforce/blip2-opt-2.7b or FLANt5?

I've tested your code on my GCP environment that has T4 and 30GB CPU RAMs, but it doesn't seem to use much CPU RAM, I guess you can run your code on T4 small. The model I tested was Salesforce/blip2-opt-2.7b.

Regarding the second question, I'm asking about YOUR space, in this repo.

Is it using FlanT5 or blip2-opt-2.7b?

@iamrobotbear Ah, sorry, I misunderstood. This Space is using FLAN T5 XXL and running on A10G small.

Hmm, now to figure out how/why I have no labels on the output boxes? When I do not have anything in the input boxes and click "SUbmit" I'll get 4 answers, despite nothing in the input boxes.

CleanShot 2023-04-02 at 08.20.07@2x.png

hysts changed discussion status to closed

Sign up or log in to comment