Spaces:

nielsr
/

donut-docvqa

Running

Multiple Question Input

by rhachriy - opened May 1, 2023

May 1, 2023

Hi, nielsr previously mentioned that it was possible to do multiple question inputs by "sending a batch of images + questions through the model... provide a batch of pixel_values + decoder_input_ids to the generate method, and use the batch_decode method of the tokenizer to turn the generated ID's into text."

Does anyone have an example of this or a similar notebook that details more about how to do this? Thank you.

nielsr

Owner May 3, 2023

Hi,

I created a notebook to illustrate that: https://colab.research.google.com/drive/1oOgGwT-I51rTcA9f2CTB3I7nX1FxEsQj?usp=sharing.

rhachriy

May 3, 2023

Thank you for the response! I am running into a few peculiar items though. When running this code, I get this output.

It works just fine though if I translate it to only do one question.

Is this because of the difference between these types of images?

Thanks.

nielsr

Owner May 3, 2023

Currently you're sending the same prompt (decoder_input_ids) twice through the model. For VQA, the prompt needs to be different per example. I'll check this tomorrow

rhachriy

May 3, 2023

I see, sounds good! If there is any way to input multiple questions for a single image, that would be awesome as well.

nielsr

Owner May 4, 2023

I've updated the notebook to reflect this. To send multiple questions to a single image, you can just duplicate the image several times in the notebook rather than using different images.

rhachriy

May 4, 2023

Gotcha, thank you so much for your help!

rhachriy changed discussion status to closed May 4, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment