vinid commited on
Commit
0264d55
1 Parent(s): b29e94e

adding HP and training details

Browse files
Files changed (1) hide show
  1. introduction.md +20 -0
introduction.md CHANGED
@@ -157,6 +157,26 @@ We split this section in two: we first provide a quantitative evaluation to ensu
157
  We then show some qualitative examples of images found by the model. **All the code we have written** to run our validation experiments (in combination with
158
  code made available by Nils Reimers and by the authors of the original CLIP) is available.
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  ## Quantitative Evaluation
161
  Showing great images is definitely cool and interesting, but a model is nothing without validation.
162
  Since this is the first clip-based model in Italian, we decided to use the multilingual CLIP model as a comparison baseline.
 
157
  We then show some qualitative examples of images found by the model. **All the code we have written** to run our validation experiments (in combination with
158
  code made available by Nils Reimers and by the authors of the original CLIP) is available.
159
 
160
+ ## Training Details
161
+
162
+ ### Datasets Splits
163
+
164
+ We tried different combinations of splits sizes for training and validation. Eventually, we focused on a 95% training split with 5% of data
165
+ going into the validation, each dataset is split in training and validation data and then we concatenate the files.
166
+ Note that the 5% means 70K validation samples, making this set almost as big as the MSCOCO dataset.
167
+
168
+ ### Hyper-parameters
169
+
170
+ The hyper-parameters can be found in the [repository](https://github.com/clip-italian/clip-italian/tree/master/hybrid_clip).
171
+ We have a maximum sequence length of 95 tokens. To compute this we look at the distribution of the captions in the various
172
+ datasets and we eventually realized that 95 was an excellent compromise between training speed and data coverage.
173
+ We use a batch size of 128 and a learning rate of 0.00001.
174
+
175
+ ### Training
176
+
177
+ We usually train until we see the loss going up and we then pick the model with the best validation loss. We adjusted the number of training epochs
178
+ as the project progressed: at first we run 100 epochs but after we replaced the optimizer we have been able to reduce this number.
179
+
180
  ## Quantitative Evaluation
181
  Showing great images is definitely cool and interesting, but a model is nothing without validation.
182
  Since this is the first clip-based model in Italian, we decided to use the multilingual CLIP model as a comparison baseline.