About the token / per image in report

#13
by lucasjin - opened

The NaVIT actaully didn't reduce tokens, if you using 980x980 or maxium of 980, the image per token should still be (980 // 16 )**2

which is far more than 600.....

HuggingFaceM4 org

We used a perceiver resampler to do pooling and reduce the number of hidden states to encode an image

HuggingFaceM4 org

closing this discusion, feel free to re-open if necessary!

VictorSanh changed discussion status to closed

Sign up or log in to comment