jienengchen 's Collections

ViTamin Family

Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.