Leyo commited on
Commit
96555cb
1 Parent(s): 28fdffe

Fix model naming (idefix/idefics)

Browse files
Files changed (1) hide show
  1. README.md +18 -18
README.md CHANGED
@@ -243,40 +243,40 @@ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, a
243
 
244
  We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
245
 
246
- | Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
247
- |:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
248
- | IDEFIX 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.8/22.5) |
249
- | | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
250
- | | 8 | 64.8 | 55.1 | 35.7 | 46.1 | 77.6 | 114.3 | 105.7 | 76.6 | 47.9 | 58.2 | - | 67.8 | - |
251
- | | 16 | 65.4 | 56.8 | 36.3 | 48.3 | 81.4 | 116.6 | 107.0 | 80.1 | - | 55.8 | - | 67.7 | - |
252
- | | 32 | 65.9 | 57.8 | 36.7 | 50.0 | 82.7 | 116.6 | 107.5 | 81.1 | - | 52.5 | - | 67.3 | - |
253
  <br>
254
- | IDEFIX 9B | 0 | 50.9 | 38.4 | 25.9 | 35.5 | 25.4 | 46.0 | 36.8 | 27.3 | 48.7 | 51.7 | 44.2 | 61.8 | 5.0 (16.8/20.8) |
255
- | | 4 | 55.4 | 45.5 | 27.6 | 36.9 | 60.0 | 93.0 | 81.3 | 59.7 | 47.9 | 50.7 | 37.4 | 62.3 | - |
256
- | | 8 | 56.4 | 47.7 | 27.5 | 40.4 | 63.2 | 97.0 | 86.8 | 61.9 | 47.6 | 51.0 | - | 66.3 | - |
257
- | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
258
- | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
259
 
260
  For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
261
 
262
  | Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
263
  |:-----------|--------:|-----------------:|:----------------|--------------------------:|
264
- | IDEFIX 80B | 16 | 1K | Random | 65.4 |
265
  | | 16 | 5K | RICES | 72.9 |
266
  <br>
267
- | IDEFIX 9B | 16 | 1K | Random | 53.5 |
268
  | | 16 | 5K | RICES | 64.5 |
269
 
270
  Fairness Evaluations:
271
  | Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
272
  |:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
273
- | IDEFIX 80B | 0 | 95.8 | 64.1 | 51.0 |
274
  | | 4 | 95.2 | 48.8 | 50.6 |
275
  | | 8 | 95.5 | 52.3 | 53.1 |
276
  | | 16 | 95.7 | 47.6 | 52.8 |
277
  | | 32 | 95.7 | 36.5 | 51.2 |
278
  <br>
279
- | IDEFIX 9B | 0 | 94.4 | 55.3 | 45.1 |
280
  | | 4 | 93.9 | 35.3 | 44.3 |
281
  | | 8 | 95.4 | 44.7 | 46.0 |
282
  | | 16 | 95.8 | 43.0 | 46.1 |
@@ -304,13 +304,13 @@ Idefics Instruct Evaluations:
304
  Fairness Evaluations:
305
  | Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
306
  |:---------------------|--------:|----------------------------:|--------------------------:|-------------------------:|
307
- | 80B IDEFICS Instruct | 0 | 95.7 | 63.4 | 47.1 |
308
  | | 4 | 95.6 | 51.4 | 48.3 |
309
  | | 8 | 95.8 | 51.0 | 51.1 |
310
  | | 16 | 96.1 | 47.6 | 51.8 |
311
  | | 32 | 96.2 | 36.8 | 50.3 |
312
  <br>
313
- | 9B IDEFICS Instruct | 0 | 92.7 | 59.6 | 43.9 |
314
  | | 4 | 95.2 | 43.3 | 38.7 |
315
  | | 8 | 95.8 | 51.7 | 40.1 |
316
  | | 16 | 96.1 | 58.9 | 41.7 |
 
243
 
244
  We note that since IDEFICS was trained on PMD (which contains COCO), the evaluation numbers on COCO are not directly comparable with Flamingo and OpenFlamingo since they did not explicitely have this dataset in the training mixture. Additionally, Flamingo is trained with images of resolution 320 x 320 while IDEFICS and OpenFlamingo were trained with images of 224 x 224 resolution.
245
 
246
+ | Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
247
+ |:------------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
248
+ | IDEFICS 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.75/22.5)|
249
+ | | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
250
+ | | 8 | 64.8 | 55.1 | 35.7 | 46.1 | 77.6 | 114.3 | 105.7 | 76.6 | 47.9 | 58.2 | - | 67.8 | - |
251
+ | | 16 | 65.4 | 56.8 | 36.3 | 48.3 | 81.4 | 116.6 | 107.0 | 80.1 | - | 55.8 | - | 67.7 | - |
252
+ | | 32 | 65.9 | 57.8 | 36.7 | 50.0 | 82.7 | 116.6 | 107.5 | 81.1 | - | 52.5 | - | 67.3 | - |
253
  <br>
254
+ | IDEFICS 9B | 0 | 50.9 | 38.4 | 25.9 | 35.5 | 25.4 | 46.0 | 36.8 | 27.3 | 48.7 | 51.7 | 44.2 | 61.8 | 5.0 (16.8/20.8) |
255
+ | | 4 | 55.4 | 45.5 | 27.6 | 36.9 | 60.0 | 93.0 | 81.3 | 59.7 | 47.9 | 50.7 | 37.4 | 62.3 | - |
256
+ | | 8 | 56.4 | 47.7 | 27.5 | 40.4 | 63.2 | 97.0 | 86.8 | 61.9 | 47.6 | 51.0 | - | 66.3 | - |
257
+ | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
258
+ | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
259
 
260
  For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
261
 
262
  | Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
263
  |:-----------|--------:|-----------------:|:----------------|--------------------------:|
264
+ | IDEFICS 80B | 16 | 1K | Random | 65.4 |
265
  | | 16 | 5K | RICES | 72.9 |
266
  <br>
267
+ | IDEFICS 9B | 16 | 1K | Random | 53.5 |
268
  | | 16 | 5K | RICES | 64.5 |
269
 
270
  Fairness Evaluations:
271
  | Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
272
  |:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
273
+ | IDEFICS 80B| 0 | 95.8 | 64.1 | 51.0 |
274
  | | 4 | 95.2 | 48.8 | 50.6 |
275
  | | 8 | 95.5 | 52.3 | 53.1 |
276
  | | 16 | 95.7 | 47.6 | 52.8 |
277
  | | 32 | 95.7 | 36.5 | 51.2 |
278
  <br>
279
+ | IDEFICS 9B | 0 | 94.4 | 55.3 | 45.1 |
280
  | | 4 | 93.9 | 35.3 | 44.3 |
281
  | | 8 | 95.4 | 44.7 | 46.0 |
282
  | | 16 | 95.8 | 43.0 | 46.1 |
 
304
  Fairness Evaluations:
305
  | Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
306
  |:---------------------|--------:|----------------------------:|--------------------------:|-------------------------:|
307
+ | IDEFICS 80B Instruct | 0 | 95.7 | 63.4 | 47.1 |
308
  | | 4 | 95.6 | 51.4 | 48.3 |
309
  | | 8 | 95.8 | 51.0 | 51.1 |
310
  | | 16 | 96.1 | 47.6 | 51.8 |
311
  | | 32 | 96.2 | 36.8 | 50.3 |
312
  <br>
313
+ | IDEFICS 9B Instruct | 0 | 92.7 | 59.6 | 43.9 |
314
  | | 4 | 95.2 | 43.3 | 38.7 |
315
  | | 8 | 95.8 | 51.7 | 40.1 |
316
  | | 16 | 96.1 | 58.9 | 41.7 |