imirandam
/

CLIP_Detector

Model card Files Files and versions

imirandam commited on Jun 13, 2024

Commit

f6f52fa

·

verified ·

1 Parent(s): 321a3c1

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -4,20 +4,20 @@ datasets:
 - imirandam/TROHN-Img
 ---
-# Model Card for CLIP_Detectos
 ## Model Description
 - **Homepage:** https://imirandam.github.io/BiVLC_project_page/
 - **Repository:** https://github.com/IMirandaM/BiVLC
 - **Paper:**
 - **Point of Contact:** [Imanol Miranda](mailto:imanol.miranda@ehu.eus)
 ### Model Summary
-CLIP_Detector is a model presented in the [BiVLC](https://github.com/IMirandaM/BiVLC) paper for experimentation. It has been trained with the OpenCLIP framework using the CLIP ViT-B-32 model pre-trained by 'openai' as a basis. The encoders are kept frozen, and a sigmoid neuron is added on top of each encoder (more details in the paper). The objective of the model is to classify text and images as natural or synthetic. Hyperparameters:
 * Learning rate: 1e-6.
 * Optimizer: Adam optimizer with beta1 = 0.9, beta2 = 0.999, eps = 1e-08 and without weight decay.
 * Loss function: Binary cross-entropy loss (BCELoss).
 * Batch size: We define a batch size of 400.
-* Epochs: We trained the text detector over 10 epochs and the image detectors over 1 epoch. We used validation accuracy as the model selection criterion, i.e. we selected the model with highest accuracy in the corresponding validation set.
 * Data: Then sigmoid neuron is trained with [TROHN-Img](https://huggingface.co/datasets/imirandam/TROHN-Img) dataset.
 ### Licensing Information

 - imirandam/TROHN-Img
 ---
+# Model Card for CLIP_Detector
 ## Model Description
 - **Homepage:** https://imirandam.github.io/BiVLC_project_page/
 - **Repository:** https://github.com/IMirandaM/BiVLC
 - **Paper:**
 - **Point of Contact:** [Imanol Miranda](mailto:imanol.miranda@ehu.eus)
 ### Model Summary
+CLIP_Detector is a model presented in the [BiVLC](https://github.com/IMirandaM/BiVLC) paper for experimentation. It has been trained with the OpenCLIP framework using the CLIP ViT-B-32 model pre-trained by 'openai' as a basis. For binary classification, the encoders are kept frozen. A sigmoid neuron is added over the CLS embedding for the image encoder and over the EOT embedding for the text encoder (more details in the paper). The objective of the model is to classify text and images as natural or synthetic. Hyperparameters:
 * Learning rate: 1e-6.
 * Optimizer: Adam optimizer with beta1 = 0.9, beta2 = 0.999, eps = 1e-08 and without weight decay.
 * Loss function: Binary cross-entropy loss (BCELoss).
 * Batch size: We define a batch size of 400.
+* Epochs: We trained the text detector over 10 epochs and the image detector over 1 epoch. We used validation accuracy as the model selection criterion, i.e. we selected the model with highest accuracy in the corresponding validation set.
 * Data: Then sigmoid neuron is trained with [TROHN-Img](https://huggingface.co/datasets/imirandam/TROHN-Img) dataset.
 ### Licensing Information