Leyo commited on
Commit
e23861b
1 Parent(s): d464935

Add Idefics instruct evals

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -257,6 +257,20 @@ We note that since IDEFICS was trained on PMD (which contains COCO), the evaluat
257
  | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
258
  | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
259
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
261
 
262
  | Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
@@ -268,7 +282,7 @@ For ImageNet-1k, we also report results where the priming samples are selected t
268
  | | 16 | 5K | RICES | 64.5 |
269
 
270
  Fairness Evaluations:
271
- | Model | Shots | FairFaceGender (accuracy) | FairFaceRace (accuracy) | FairFaceAge (accuracy) |
272
  |:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
273
  | IDEFIX 80B | 0 | 95.8 | 64.1 | 51.0 |
274
  | | 4 | 95.2 | 48.8 | 50.6 |
 
257
  | | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
258
  | | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
259
 
260
+ Idefics Instruct Evaluations:
261
+ | Model | Shots | <nobr>VQAv2<br>OE VQA acc.</nobr> | <nobr>OKVQA<br>OE VQA acc.</nobr> | <nobr>TextVQA<br>OE VQA acc.</nobr> | <nobr>VizWiz<br>OE VQA acc.</nobr> | <nobr>TextCaps<br>CIDEr</nobr> | <nobr>Coco<br>CIDEr</nobr> | <nobr>NoCaps<br>CIDEr</nobr> | <nobr>Flickr<br>CIDEr</nobr> | <nobr>VisDial<br>NDCG</nobr> | <nobr>HatefulMemes<br>ROC AUC</nobr> | <nobr>ScienceQA<br>acc.</nobr> | <nobr>RenderedSST2<br>acc.</nobr> | <nobr>Winoground<br>group (text/image)</nobr> |
262
+ |:---------------------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
263
+ | 80B IDEFICS Instruct | 0 | 37.4 | 36.9 | 32.9 | 26.2 | 76.5 | 117.2 | 104.5 | 65.3 | 49.3 | 58.9 | 69.5 | 67.3 | 9.2 (20.0/25.0) |
264
+ | | 4 | 67.5 | 54.0 | 37.8 | 39.8 | 71.7 | 116.9 | 104.0 | 67.1 | 48.9 | 57.5 | 60.5 | 65.5 | - |
265
+ | | 8 | 68.1 | 56.9 | 38.2 | 44.8 | 72.7 | 116.8 | 104.8 | 70.7 | 48.2 | 58.0 | - | 68.6 | - |
266
+ | | 16 | 68.6 | 58.2 | 39.1 | 48.7 | 77.0 | 120.5 | 107.4 | 76.0 | - | 56.4 | - | 70.1 | - |
267
+ | | 32 | 68.8 | 59.5 | 39.3 | 51.2 | 79.7 | 123.2 | 108.4 | 78.4 | - | 54.9 | - | 70.5 | - |
268
+ | 9B IDEFICS Instruct | 0 | 65.8 | 46.1 | 29.2 | 41.2 | 67.1 | 129.1 | 101.1 | 71.9 | 49.2 | 53.5 | 60.6 | 62.8 | 5.8 (20.0/18.0) |
269
+ | | 4 | 66.2 | 48.7 | 31.0 | 39.0 | 68.2 | 128.2 | 100.9 | 74.8 | 48.9 | 51.8 | 53.8 | 60.6 | - |
270
+ | | 8 | 66.5 | 50.8 | 31.0 | 41.9 | 70.0 | 128.8 | 101.5 | 75.5 | 48.2 | 51.7 | - | 61.3 | - |
271
+ | | 16 | 66.8 | 51.7 | 31.6 | 44.8 | 70.2 | 128.8 | 101.5 | 75.8 | - | 51.7 | - | 63.3 | - |
272
+ | | 32 | 66.9 | 52.3 | 32.0 | 46.0 | 71.7 | 127.8 | 101.0 | 76.3 | - | 50.8 | - | 60.9 | - |
273
+
274
  For ImageNet-1k, we also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance. This is the Retrieval-based In-Context Example Selection (RICES in short) approach introduced by [Yang et al. (2021)](https://arxiv.org/abs/2109.05014).
275
 
276
  | Model | Shots | Support set size | Shots selection | ImageNet-1k<br>Top-1 acc. |
 
282
  | | 16 | 5K | RICES | 64.5 |
283
 
284
  Fairness Evaluations:
285
+ | Model | Shots | <nobr>FairFaceGender<br>acc.</nobr> | <nobr>FairFaceRace<br>acc.</nobr> | <nobr>FairFaceAge<br>acc.</nobr> |
286
  |:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
287
  | IDEFIX 80B | 0 | 95.8 | 64.1 | 51.0 |
288
  | | 4 | 95.2 | 48.8 | 50.6 |