Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
xiaotianhanΒ 
posted an update Mar 28
Post
2046
πŸŽ‰ πŸŽ‰ πŸŽ‰ Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

Hiya, are you planning to open-source the models?

Β·

Thanks for your interest, yeah, we will open source our code and pretrained weights soon.