Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.
-
jienengchen/ViTamin-XL-384px
Feature Extraction • Updated • 262 • 20 -
jienengchen/ViTamin-L-336px
Feature Extraction • Updated • 1 • 4 -
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Paper • 2404.02132 • Published • 2 -
jienengchen/ViTamin-XL-336px
Feature Extraction • Updated • 2 • 1