VictorSanh commited on
Commit
65edf33
1 Parent(s): 7f05f40

trying that

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -158,9 +158,9 @@ We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQ
158
 
159
  As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
160
 
161
- <img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
162
 
163
- | Model | Shots | VQAv2 (OE VQA acc) | OKVQA (OE VQA acc) | TextVQA (OE VQA acc) | VizWiz (OE VQA acc) | TextCaps (CIDEr) | Coco (CIDEr) | NoCaps (CIDEr) | Flickr (CIDEr) | VisDial (NDCG) | HatefulMemes (ROC AUC) | ScienceQA (accuracy) | RenderedSST2 (accuracy) | Winoground (group (text/image)) |
164
  |:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
165
  | IDEFIX 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.8/22.5) |
166
  | | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
@@ -174,8 +174,12 @@ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, a
174
  | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
175
  | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
176
 
177
- Imagenet Evaluation:
178
- | Model | Shots | Imagenet |
 
 
 
 
179
  |:-----------|--------:|-----------:|
180
  | IDEFIX 80B | 16, 1k support set | 65.4 |
181
  | | 16, RICES 5k support set | 72.9 |
@@ -198,11 +202,7 @@ Fairness Evaluations:
198
  | | 16 | 95.8 | 43.0 | 46.1 |
199
  | | 32 | 96.1 | 35.1 | 44.9 |
200
 
201
- We also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
202
 
203
- TODO: table with rices shots
204
-
205
- We note that since we trained on PMD which contains COCO, the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture.
206
 
207
  # Technical Specifications
208
 
 
158
 
159
  As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
160
 
161
+ <!-- <img src="./assets/Figure_Evals_IDEFIX.png" width="55%"> <img width=120/> -->
162
 
163
+ | Model | Shots | VQAv2<br>OE VQA acc.<br> | OKVQA<br>OE VQA acc.<br> | TextVQA<br>OE VQA acc.<br> | VizWiz<br>OE VQA acc.<br> | TextCaps<br>CIDEr<br> | Coco<br>CIDEr<br> | NoCaps<br>CIDEr | Flickr<br>CIDEr | VisDial<br>NDCG | HatefulMemes<br>ROC AUC | ScienceQA<br>acc. | RenderedSST2<br>acc. | Winoground<br>group (text/image) |
164
  |:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
165
  | IDEFIX 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.8/22.5) |
166
  | | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
 
174
  | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
175
  | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
176
 
177
+ We note that since we trained on PMD which contains COCO, the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture.
178
+
179
+ For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
180
+
181
+ ImageNet-1k Evaluation:
182
+ | Model | Shots | ImageNet-1k |
183
  |:-----------|--------:|-----------:|
184
  | IDEFIX 80B | 16, 1k support set | 65.4 |
185
  | | 16, RICES 5k support set | 72.9 |
 
202
  | | 16 | 95.8 | 43.0 | 46.1 |
203
  | | 32 | 96.1 | 35.1 | 44.9 |
204
 
 
205
 
 
 
 
206
 
207
  # Technical Specifications
208