About the token / per image in report

#13

by lucasjin - opened Apr 17

Apr 17

The NaVIT actaully didn't reduce tokens, if you using 980x980 or maxium of 980, the image per token should still be (980 // 16 )**2

which is far more than 600.....

HuggingFaceM4 org Apr 17

We used a perceiver resampler to do pooling and reduce the number of hidden states to encode an image

Apr 19

closing this discusion, feel free to re-open if necessary!

VictorSanh changed discussion status to closed Apr 19

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment