Spaces:

Salesforce
/

BLIP

Build error

App Files Files Community

Is There an API For BLIP or similar?

by tedd321 - opened Nov 7, 2022

Discussion

tedd321

Nov 7, 2022

I need an api to caption images on the web.

Does this exist via BLIP?

JunnanLi

Nov 7, 2022

Hi, thanks for your interest! You may want to take a look at our LAVIS library which support easy off-the-shelf inference
https://github.com/salesforce/LAVIS

jojofan

Nov 11, 2022

Hi! I don't know how to use API, would you update the web for caption some images at the same time?

dxli1

Nov 12, 2022

Hi, @jojofan , please take a look at https://github.com/salesforce/LAVIS/#image-captioning.

To infer on multiple images, you just need to concatenate the processed images along the batch dimension.

Thanks.

yvblake

Jan 22, 2023

Hi, @jojofan , please take a look at https://github.com/salesforce/LAVIS/#image-captioning.

To infer on multiple images, you just need to concatenate the processed images along the batch dimension.

Thanks.

Can you describe a few steps that I might need to do take to caption a whole image set?

Or give an example of what such concatenation would look like?

The example at https://github.com/salesforce/LAVIS/blob/main/examples/blip_image_captioning.ipynb a nice proof of concept, but doesn't help much for doing thousands of images

Thanks for your help

ybelkada

Salesforce org Jan 22, 2023

Hi @yvblake ,
You can either use LAVIS for that: https://github.com/salesforce/LAVIS/#image-captioning as stated by @dxli1 and build your own API using the library, or doing it via transformers as the architecture has been recently added to transformers.
I guess what @dxli1 tried to explain is that you can process multiple images at once using batched generation (concatenate multiple images/input and pass it to the model), and run the model through the whole image set by getting the predictions batch by batch.
There is also an article on how to build an image captioning API using transformers, BLIP & Gradio, I think that you can do the same with LAVIS as well: https://medium.com/@younes_belkada/how-to-write-a-image-captioning-api-using-gradio-and-blip-with-few-lines-of-code-9dfb88254b0
If you face into any issue, the easiest would be to share with us the piece of code you used to reproduce the issue and we can discuss further.
Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment