just landed at Hugging Face Hub: community-led computer vision course ππ€ learn from fundamentals to details of the bleeding edge vision transformers!
1 reply
Β·
reacted to xiaotianhan's
post with ππ8 months ago
π π π Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.