About the token / per image in report
#13
by
lucasjin
- opened
The NaVIT actaully didn't reduce tokens, if you using 980x980 or maxium of 980, the image per token should still be (980 // 16 )**2
which is far more than 600.....
We used a perceiver resampler to do pooling and reduce the number of hidden states to encode an image
closing this discusion, feel free to re-open if necessary!
VictorSanh
changed discussion status to
closed