Text location in screenshots

#12
by pcuenq HF staff - opened
No description provided.
pcuenq changed pull request status to open

This is what it looks like:
Screenshot 2023-10-20 at 21.55.40.png

If you have suggestions for nice screenshots to use as examples, please do let us know!

Adept AI Labs org

Is it possible to add a resize to 1080x1920 for this demo specifically? I think that will lead to better performance

Adept AI Labs org

Typically new lines seem to help at the end of prompts in our experience :)

@somaniarushi thanks! For VQA, though, I found worse results with the newline, at least in some of the examples. For example, the Walmart receipt example returns "There are four items sold." whereas the version without the newline generates "5". Is't true that the version without the newline is a bit too concise in this case, but the answer is right.

I would suggest we use the existing prompts for now, and we keep testing on more examples and update, if appropriate. What do you think?

Note also that the "coco style" caption does include a newline, as recommended in your original code and tests.

Adept AI Labs org

@pcuenq For sure, let's go with evidence over my intuition!

@somaniarushi oh, but your intuition is better informed than mine! :) We'll continue to test on more examples and share our findings!

mtensor changed pull request status to merged

Sign up or log in to comment