metascroy commited on
Commit
6d484ee
·
verified ·
1 Parent(s): 7564365

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -141,25 +141,24 @@ print(make_table(results))
141
 
142
  | Benchmark | | |
143
  |----------------------------------|-------------|-------------------|
144
- | | Phi-4 mini-Ins | phi4-mini-8dq4w |
145
  | **Popular aggregated benchmark** | | |
146
- | mmlu | 66.73 | 63.11 |
147
- | mmlu_pro | 44.71 | 35.31 |
148
  | **Reasoning** | | |
149
- | arc_challenge | TODO | 0.5512 |
150
- | gpqa | TODO | TODO |
151
- | hellaswag | 54.57 | 0.5323 |
152
- | openbookqa | TODO | 0.3240 |
153
- | piqa | TODO | 0.7666 |
154
- | siqa | TODO | 0.4708 |
155
- | truthfulqa | TODO | 0.3953 |
156
- | winogrande | TODO | 0.7017 |
157
  | **Multilingual** | | |
158
- | Mgsm | TODO | TODO |
159
- | mgsm_cot_native | TODO | TODO |
160
  | **Math** | | |
161
- | gsm8k | TODO | 0.7043 |
162
- | Mathqa | TODO | 0.4157 |
163
  | **Overall** | **TODO** | **TODO** |
164
 
165
 
 
141
 
142
  | Benchmark | | |
143
  |----------------------------------|-------------|-------------------|
144
+ | | Phi-4 mini-Ins | phi4-mini-8dq4w|
145
  | **Popular aggregated benchmark** | | |
146
+ | mmlu (0 shot) | 66.73 | 63.11 |
147
+ | mmlu_pro (5-shot) | 44.71 | 35.31 |
148
  | **Reasoning** | | |
149
+ | arc_challenge | 56.91 | 55.12 |
150
+ | gpqa_main_zeroshot | 30.13 | 29.02 |
151
+ | hellaswag | 54.57 | 53.23 |
152
+ | openbookqa | 33.00 | 32.40 |
153
+ | piqa (0-shot) | 77.64 | 76.66 |
154
+ | siqa | 49.59 | 47.08 |
155
+ | truthfulqa_mc2 (0-shot) | 48.39 | 47.99 |
156
+ | winogrande (0-shot) | 71.11 | 70.17 |
157
  | **Multilingual** | | |
158
+ | mgsm_en_cot_en | 60.8? | 0.620 |
 
159
  | **Math** | | |
160
+ | gsm8k (5-shot) | 81.88 | 70.43 |
161
+ | Mathqa (0-shot) | 42.31 | 41.57 |
162
  | **Overall** | **TODO** | **TODO** |
163
 
164