Example code to run python inference with image and text prompt input?

by ibadami - opened Dec 7, 2023

Dec 7, 2023

Did anyone try this model with an image and text prompt for Python inference?
Can you share your script to show an example how what the function call should look like?
Thank you.

YaTharThShaRma999

Dec 7, 2023

Not sure about any script but text generation web ui has a way to load the model and chat with images. You could slightly edit it and make it into a llava part into a different repository. And then use o the python

ibadami

Dec 7, 2023

I see. Thank you. I looked quickly at the web UI earlier, looking at the screenshots there, i could not find any button that allows uploading images. I will check once again though.

YaTharThShaRma999

Dec 8, 2023

you have to allow the multiimodel extensions when loading text generation web ui

ibadami

Dec 8, 2023

Yes, I just saw their multimodal extension bit. I will try and track down the code that is used to generate image tokens to prepand the text prompt. So that I can use this model outside of the webUI in my own project.
I will post my results once I put together all the pieces. :) Thank you once again!

YaTharThShaRma999

Dec 8, 2023

yeah it would be very awesome if you did that. Also np!

ysdj

Jan 31

Yes, I just saw their multimodal extension bit. I will try and track down the code that is used to generate image tokens to prepand the text prompt. So that I can use this model outside of the webUI in my own project.
I will post my results once I put together all the pieces. :) Thank you once again!

hey - any progress? I'm trying to load the model for inference and can't figure out how to run load an image in the prompt. thanks!

ibadami

Feb 6

Sorry for the late reply. No progress on that front! Still using the original code from LLAVA 1.5 creators.

YaTharThShaRma999

Feb 6

@ysdj transformers has native support for llava now but I’m not sure if it will work with transformers still.

Either way, I don’t recommend you use gptq as the quality and speed is much lower then llama cpp since exllama doesn’t support multi model gptq.

You can use the gguf variant and just use llama cpp python and that will work

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment