What is the significance of the parameters input_size of max_num?

by mstachow - opened Jul 30, 2024

Jul 30, 2024

How does changing them affect the output or accuracy? I read the report for VL1.5 but it just mentioned that it was possible to change them and that in training max_num was 12, not necessarily what the difference was.

czczup

OpenGVLab org Aug 1, 2024

Hello, this parameter will change the resolution of the input image, so it will change the accuracy.

mstachow

Aug 1, 2024

Right but how? I've tried changing input_size any everything other than 448 crashed the code. Changing max_num seemed to do nothing. Is bigger higher resolution or is it small max_num for higher res?

I found that loading in 8 bit actually has higher performance on my vqa task so I think something is wrong with what I'm doing...

czczup

OpenGVLab org Aug 9, 2024

input_size represents the size of each image tile, which is 448 and cannot be modified. You can control the resolution of the input image by adjusting max_num; the larger the max_num, the larger the input image will be.

mstachow

Aug 9, 2024

So to make sure I understand it, if max_num is 1 my image gets resized to 448x448 and is only a single tile. If my max_num is like 16 it gets turned into at most 16 448x448 images, depending on the original size?

zwgao

OpenGVLab org Aug 20, 2024

So to make sure I understand it, if max_num is 1 my image gets resized to 448x448 and is only a single tile. If my max_num is like 16 it gets turned into at most 16 448x448 images, depending on the original size?

yes

czczup changed discussion status to closed Aug 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment