imirandam
/

CLIP_TROHN-Text

Model card Files Files and versions Community

imirandam commited on Jun 13

Commit

6a73d3e

•

1 Parent(s): 78afc44

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- imirandam/TROHN-Text
+---
+# Model Card for CLIP_TROHN-Text
+## Model Description
+- **Homepage:** https://imirandam.github.io/BiVLC_project_page/
+- **Repository:** https://github.com/IMirandaM/BiVLC
+- **Paper:**
+- **Point of Contact:** [Imanol Miranda](mailto:imanol.miranda@ehu.eus)
+### Model Summary
+CLIP_TROHN-Text is a model presented in the [BiVLC](https://github.com/IMirandaM/BiVLC) paper for experimentation. It has been fine-tuned with OpenCLIP framework using as basis the CLIP ViT-B-32 model pre-trained by 'openai'. Hyperparameters:
+* Learning rate: 1e-6.
+* Scheduler: Cosine scheduler with 50 warmup steps.
+* Optimizer: AdamW optimizer with beta1 = 0.9, beta2 = 0.98, eps = 1e-6 and weight decay = 0.1.
+* Loss function: InfoNCE Loss. The loss is modified to add only negative captions following the idea proposed in NEGCLIP.
+* Batch size: We define a batch size of 200, and then we add negatives. As it has not hard negative images, it results in 200 images x 400 captions (positive + hard negatives).
+* Epochs: We fine-tune all models over 10 epochs and we used validation accuracy as the model selection criterion, i.e. we selected the model with the highest accuracy on the corresponding validation set.
+* Data: It is fine-tuned with [TROHN-Text](https://huggingface.co/datasets/imirandam/TROHN-Text) dataset.
+### Evaluation Data
+The model is evaluated in [BiVLC](https://huggingface.co/datasets/imirandam/BiVLC).
+### Licensing Information
+This work is licensed under a MIT License.
+## Citation Information
+If you find this dataset useful, please consider citing our paper:
+```
+@inproceedings{,
+        title={},
+        author={},
+        booktitle={},
+        year={}
+}
+```