Update README.md
Browse files
README.md
CHANGED
@@ -60,13 +60,13 @@ For easy reproduction of our evaluation results, we list the evaluation tools an
|
|
60 |
|
61 |
| | Evaluation setting | Metric | Evaluation tool |
|
62 |
|------------|--------------------|-------------------------------------|-----------------|
|
63 |
-
| KMMLU | 5-shot | macro\_avg / exact\_match
|
64 |
-
| KMMLU Hard | 5-shot | macro\_avg / exact\_match
|
65 |
| KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
|
66 |
-
| Belebele | 0-shot | acc
|
67 |
-
| CSATQA | 0-shot | acc\_norm
|
68 |
-
| MMLU | 5-shot | macro\_avg / acc
|
69 |
-
| MMLU Pro | 5-shot | macro\_avg / exact\_match
|
70 |
| GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
|
71 |
|
72 |
## Quickstart
|
|
|
60 |
|
61 |
| | Evaluation setting | Metric | Evaluation tool |
|
62 |
|------------|--------------------|-------------------------------------|-----------------|
|
63 |
+
| KMMLU | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
64 |
+
| KMMLU Hard | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
65 |
| KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
|
66 |
+
| Belebele | 0-shot | acc | lm-eval-harness |
|
67 |
+
| CSATQA | 0-shot | acc\_norm | lm-eval-harness |
|
68 |
+
| MMLU | 5-shot | macro\_avg / acc | lm-eval-harness |
|
69 |
+
| MMLU Pro | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
|
70 |
| GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
|
71 |
|
72 |
## Quickstart
|