Update metric table
#2
by
Taekyoon
- opened
README.md
CHANGED
@@ -131,15 +131,15 @@ Model evaluation metrics and results.
|
|
131 |
|
132 |
### Benchmark Results
|
133 |
|
134 |
-
| Category | Metric | Shots |
|
135 |
|----------------------------------|----------------------|------------|--------|
|
136 |
| **Default Metric** | **ACC** | | |
|
137 |
| **Knowledge (5-shot)** | MMLU | | 61.76 |
|
138 |
-
| | KMMLU
|
139 |
| | CMLU | | 50.93 |
|
140 |
| | JMLU | | |
|
141 |
| | C-EVAL | | 50.07 |
|
142 |
-
| | HAERAE
|
143 |
| **KoBest (5-shot)** | BoolQ | | 85.47 |
|
144 |
| | COPA | | 83.5 |
|
145 |
| | Hellaswag (acc-norm) | | 63.2 |
|
@@ -154,8 +154,8 @@ Model evaluation metrics and results.
|
|
154 |
| **JP Eval Harness (Prompt ver 0.3)** | JcommonsenseQA | 3-shot | 85.97 |
|
155 |
| | JNLI | 3-shot | 39.11 |
|
156 |
| | Marc_ja | 3-shot | 96.48 |
|
157 |
-
| | JSquad
|
158 |
-
| | Jaqket
|
159 |
| | MGSM | 5-shot | 28.8 |
|
160 |
| **XWinograd (0-shot)** | EN | | 89.03 |
|
161 |
| | FR | | 72.29 |
|
|
|
131 |
|
132 |
### Benchmark Results
|
133 |
|
134 |
+
| Category | Metric | Shots | Score |
|
135 |
|----------------------------------|----------------------|------------|--------|
|
136 |
| **Default Metric** | **ACC** | | |
|
137 |
| **Knowledge (5-shot)** | MMLU | | 61.76 |
|
138 |
+
| | KMMLU (Exact Match) | | 42.75 |
|
139 |
| | CMLU | | 50.93 |
|
140 |
| | JMLU | | |
|
141 |
| | C-EVAL | | 50.07 |
|
142 |
+
| | HAERAE | 0-shot | 63.89 |
|
143 |
| **KoBest (5-shot)** | BoolQ | | 85.47 |
|
144 |
| | COPA | | 83.5 |
|
145 |
| | Hellaswag (acc-norm) | | 63.2 |
|
|
|
154 |
| **JP Eval Harness (Prompt ver 0.3)** | JcommonsenseQA | 3-shot | 85.97 |
|
155 |
| | JNLI | 3-shot | 39.11 |
|
156 |
| | Marc_ja | 3-shot | 96.48 |
|
157 |
+
| | JSquad (Exact Match) | 2-shot | 70.69 |
|
158 |
+
| | Jaqket (Exact Match) | 1-shot | 81.53 |
|
159 |
| | MGSM | 5-shot | 28.8 |
|
160 |
| **XWinograd (0-shot)** | EN | | 89.03 |
|
161 |
| | FR | | 72.29 |
|