BiVLC
Collection
BIVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
•
15 items
•
Updated
CLIP_TROHN-Img is a model presented in the BiVLC paper for experimentation. It has been fine-tuned with OpenCLIP framework using as basis the CLIP ViT-B-32 model pre-trained by 'openai'. The idea behind this fine-tuning is to improve the compositional understanding of the model by adding negative pairs, i.e., negative captions and negative images. The negatives present small compositional changes. Hyperparameters:
The model is evaluated in BiVLC.
This work is licensed under a MIT License.
If you find this dataset useful, please consider citing our paper:
@misc{miranda2024bivlc,
title={BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval},
author={Imanol Miranda and Ander Salaberria and Eneko Agirre and Gorka Azkune},
year={2024},
eprint={2406.09952},
archivePrefix={arXiv},
primaryClass={cs.CV}
}