Example code to run python inference with image and text prompt input?

#6
by ibadami - opened

Did anyone try this model with an image and text prompt for Python inference?
Can you share your script to show an example how what the function call should look like?
Thank you.

Not sure about any script but text generation web ui has a way to load the model and chat with images. You could slightly edit it and make it into a llava part into a different repository. And then use o the python

I see. Thank you. I looked quickly at the web UI earlier, looking at the screenshots there, i could not find any button that allows uploading images. I will check once again though.

you have to allow the multiimodel extensions when loading text generation web ui

Yes, I just saw their multimodal extension bit. I will try and track down the code that is used to generate image tokens to prepand the text prompt. So that I can use this model outside of the webUI in my own project.
I will post my results once I put together all the pieces. :) Thank you once again!

yeah it would be very awesome if you did that. Also np!

Yes, I just saw their multimodal extension bit. I will try and track down the code that is used to generate image tokens to prepand the text prompt. So that I can use this model outside of the webUI in my own project.
I will post my results once I put together all the pieces. :) Thank you once again!

hey - any progress? I'm trying to load the model for inference and can't figure out how to run load an image in the prompt. thanks!

Sorry for the late reply. No progress on that front! Still using the original code from LLAVA 1.5 creators.

@ysdj transformers has native support for llava now but I’m not sure if it will work with transformers still.

Either way, I don’t recommend you use gptq as the quality and speed is much lower then llama cpp since exllama doesn’t support multi model gptq.

You can use the gguf variant and just use llama cpp python and that will work

Sign up or log in to comment