Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.
-
jienengchen/ViTamin-XL-384px
Feature Extraction • Updated • 102 • 18 -
jienengchen/ViTamin-L-336px
Feature Extraction • Updated • 13 • 4 -
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Paper • 2404.02132 • Published • 2 -
jienengchen/ViTamin-XL-336px
Feature Extraction • Updated • 12 • 1