Memory optimization for long sequence with many small images: reduce resampler_n_latents

#18

by chenzizhao - opened Apr 18

Apr 18

Hi there,

Congrats on releasing this amazing model! I am fine-tuning it for a VQA task involving 10 or 20 low-res images (224x224 pixels), and I have followed https://huggingface.co/HuggingFaceM4/idefics2-8b#model-optimizations and set do_image_splitting=False and size for the processor. Still, I seem bottlenecked by sequence length (thus GPU memory) because most tokens are 64 <image> tokens per image.

I am fiddling with the idea of reducing config.perceiver_config.resampler_n_latents from the default 64 to 32 or even 16. Is it possible at all, to reuse existing weights but use fewer than 64 latents in idefics2-8B?

Thanks!

HugoLaurencon

Apr 18

Thanks for your comment

Yes, with do_image_splitting=False you will use 64 tokens per image.

Do you mean that you have 10 to 20 images per example for your task?
Idefics2-base has a maximum sequence length of 2048, while we used a maximum of 1024 for the SFT that led to Idefics2, so exceeding this number might give unexpected results (but since you fine-tune it, it can also learn to go beyond of course).

It's not recommended to change the config.perceiver_config.resampler_n_latents.
However, if you really want to encode your image with a super low number of tokens, you could do an average pooling on the 64 tokens to make it 32 or 16, and fine-tune it this way.

VictorSanh

Apr 18

note @ch272h that technically, the modeling supports out of the box much longer sequences through mistral window attention, we just tune up to 2048. so if you are open to fine-tuning on long sequences, you can do that out of the box without even doing additional pooling. that would hopefully close the gap you will potentially see when exceeding the sequence lengths we trained on.

chenzizhao

Apr 18

Thank you so much for the recommendations. @HugoLaurencon I will look into your suggestion.

HugoLaurencon

Apr 22

Feel free to open this discussion again if you have any problem

HugoLaurencon changed discussion status to closed Apr 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment