HuggingFaceM4
/

idefics-80b

@@ -158,9 +158,9 @@ We perform checkpoint selection based on validation sets of VQAv2, TextVQA, OKVQ
 As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
-<img src="./assets/Figure_Evals_IDEFIX.png"  width="55%">
-| Model      |   Shots |   VQAv2 (OE VQA acc) |   OKVQA (OE VQA acc) |   TextVQA (OE VQA acc) |   VizWiz (OE VQA acc) |   TextCaps (CIDEr) |   Coco (CIDEr) |   NoCaps (CIDEr) |   Flickr (CIDEr) |   VisDial (NDCG) |   HatefulMemes (ROC AUC) |   ScienceQA (accuracy) |   RenderedSST2 (accuracy) |   Winoground (group (text/image)) |
 |:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
 | IDEFIX 80B |       0 |                 60.0 |                 45.2 |                   30.9 |                  36.0 |               56.8 |           91.8 |             65.0 |             53.7 |             48.8 |                     60.6 |                   68.9 |                      60.5 |                               8.0 (18.8/22.5) |
 |            |       4 |                 63.6 |                 52.4 |                   34.4 |                  40.4 |               72.7 |          110.3 |             99.6 |             73.7 |             48.4 |                     57.8 |                   58.9 |                      66.6 |                              - |
@@ -174,8 +174,12 @@ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, a
 |            |      16 |                 57.0 |                 48.4 |                   27.9 |                  42.6 |               67.4 |           99.7 |             89.4 |             64.5 |             - |                     50.9 |                   - |                      67.8 |                              - |
 |            |      32 |                 57.9 |                 49.6 |                   28.3 |                  43.7 |               68.1 |           98.0 |             90.5 |             64.4 |             - |                     49.8 |                   - |                      67.0 |                              - |
-Imagenet Evaluation:
-| Model      |   Shots |   Imagenet |
 |:-----------|--------:|-----------:|
 | IDEFIX 80B |      16, 1k support set |       65.4 |
 |            |      16, RICES 5k support set |       72.9 |
@@ -198,11 +202,7 @@ Fairness Evaluations:
 |            |      16 |                        95.8 |                      43.0 |                     46.1 |
 |            |      32 |                        96.1 |                      35.1 |                     44.9 |
-We also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
-TODO: table with rices shots
-We note that since we trained on PMD which contains COCO, the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture.
 # Technical Specifications

 As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, and as such, we did not evaluate the model on video-text benchmarks like Flamingo did. We leave that evaluation for a future iteration.
+<!-- <img src="./assets/Figure_Evals_IDEFIX.png"  width="55%"> <img width=120/> -->
+| Model      |   Shots | VQAv2<br>OE VQA acc.<br> | OKVQA<br>OE VQA acc.<br> | TextVQA<br>OE VQA acc.<br> | VizWiz<br>OE VQA acc.<br> |  TextCaps<br>CIDEr<br> | Coco<br>CIDEr<br> | NoCaps<br>CIDEr | Flickr<br>CIDEr | VisDial<br>NDCG | HatefulMemes<br>ROC AUC | ScienceQA<br>acc. |  RenderedSST2<br>acc. |   Winoground<br>group (text/image) |
 |:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
 | IDEFIX 80B |       0 |                 60.0 |                 45.2 |                   30.9 |                  36.0 |               56.8 |           91.8 |             65.0 |             53.7 |             48.8 |                     60.6 |                   68.9 |                      60.5 |                               8.0 (18.8/22.5) |
 |            |       4 |                 63.6 |                 52.4 |                   34.4 |                  40.4 |               72.7 |          110.3 |             99.6 |             73.7 |             48.4 |                     57.8 |                   58.9 |                      66.6 |                              - |
 |            |      16 |                 57.0 |                 48.4 |                   27.9 |                  42.6 |               67.4 |           99.7 |             89.4 |             64.5 |             - |                     50.9 |                   - |                      67.8 |                              - |
 |            |      32 |                 57.9 |                 49.6 |                   28.3 |                  43.7 |               68.1 |           98.0 |             90.5 |             64.4 |             - |                     49.8 |                   - |                      67.0 |                              - |
+We note that since we trained on PMD which contains COCO, the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture.
+For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
+ImageNet-1k Evaluation:
+| Model      |   Shots |   ImageNet-1k |
 |:-----------|--------:|-----------:|
 | IDEFIX 80B |      16, 1k support set |       65.4 |
 |            |      16, RICES 5k support set |       72.9 |
 |            |      16 |                        95.8 |                      43.0 |                     46.1 |
 |            |      32 |                        96.1 |                      35.1 |                     44.9 |
 # Technical Specifications