VictorSanh commited on
Commit
5608691
1 Parent(s): 6b79b2a

video datasets

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -156,6 +156,8 @@ We compare our model to the original Flamingo along with [OpenFlamingo](openflam
156
 
157
  We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
158
 
 
 
159
  <img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
160
 
161
  TODO: update this table
 
156
 
157
  We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQA, VizWiz, Visual Dialogue, Coco, Flickr30k, and HatefulMemes. We select the checkpoint at step 65'000 for IDEFICS-9B and at step 37'500 for IDEFICS. The models are evaluated with in-context few-shot learning where the priming instances are selected at random from a support set. We do not use any form of ensembling.
158
 
159
+ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
160
+
161
  <img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
162
 
163
  TODO: update this table