This is an unofficial implementation of ALIGN trained on COYO-700M. The official ALIGN is trained on its dataset of 1.8B samples. That dataset is not released to the public. Instead, we trained our implementation of ALIGN model on COYO-700M.
It's developed by Kakao Brain to validate the performance of COYO-700M dataset on a large-scale model.
The training took about 8 days on TPU V3-512.
This is dual encoder model where
- image encoder is using EfficientNet-B7 architecture
- text encoder is using BERT-base architecture
This model is trained on COYO-700M dataset.
|KNN||I2T R@1||T2I R@1||I2T R@1||T2I R@1|
- Downloads last month