Getting errors even for the example input

#3
by joonyeongs - opened

์ œ๋ชฉ ์—†์Œ.png

I recently had the opportunity to read your paper, which I found to be quite fascinating and insightful. Motivated by the intriguing content, I was eager to experiment with the open source demo you've provided. Unfortunately, I encountered some issues while using it. Specifically, I ran into errors even when using the example inputs provided.

I am genuinely interested in exploring the capabilities of your work further. Therefore, I would greatly appreciate any assistance you could provide in resolving these errors. Your guidance will enable me to fully engage with the examples and better understand the practical applications of your research.

Thank you for your time and consideration.

Owner

Hi, there was a cuda oom error previously and I have re-started the space. I will try to find the cause of this occasional cuda oom error.

image.png
Thanks the examples work properly. However, when I upload a custom image, I still fail. Does this model have limits for certain questions or images?

Owner

Yes, this is just a failure case of our model. In this case, it does not activate the visual search process as the turtle is prominent. So the VQA LLM directly gives the answer as shown in the Direct Answer box. The VQA LLM (LLaVA) fails to solve this problem partly due to the lack of such data during training, and the inability of spatial or orientation reasoning for general multimodal LLMs is still a challenge.

Ok I get it. Thank you for your wonderful work.

Sign up or log in to comment