Is There an API For BLIP or similar?

#1
by tedd321 - opened

I need an api to caption images on the web.

Does this exist via BLIP?

Hi, thanks for your interest! You may want to take a look at our LAVIS library which support easy off-the-shelf inference
https://github.com/salesforce/LAVIS

Hi! I don't know how to use API, would you update the web for caption some images at the same time?

Hi, @jojofan , please take a look at https://github.com/salesforce/LAVIS/#image-captioning.

To infer on multiple images, you just need to concatenate the processed images along the batch dimension.

Thanks.

Hi, @jojofan , please take a look at https://github.com/salesforce/LAVIS/#image-captioning.

To infer on multiple images, you just need to concatenate the processed images along the batch dimension.

Thanks.

Can you describe a few steps that I might need to do take to caption a whole image set?

Or give an example of what such concatenation would look like?

The example at https://github.com/salesforce/LAVIS/blob/main/examples/blip_image_captioning.ipynb a nice proof of concept, but doesn't help much for doing thousands of images

Thanks for your help

Salesforce org

Hi @yvblake ,
You can either use LAVIS for that: https://github.com/salesforce/LAVIS/#image-captioning as stated by @dxli1 and build your own API using the library, or doing it via transformers as the architecture has been recently added to transformers.
I guess what @dxli1 tried to explain is that you can process multiple images at once using batched generation (concatenate multiple images/input and pass it to the model), and run the model through the whole image set by getting the predictions batch by batch.
There is also an article on how to build an image captioning API using transformers, BLIP & Gradio, I think that you can do the same with LAVIS as well: https://medium.com/@younes_belkada/how-to-write-a-image-captioning-api-using-gradio-and-blip-with-few-lines-of-code-9dfb88254b0
If you face into any issue, the easiest would be to share with us the piece of code you used to reproduce the issue and we can discuss further.
Thanks!

Sign up or log in to comment