likejazz commited on
Commit
2d84212
·
verified ·
1 Parent(s): fa87a74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -60,13 +60,13 @@ For easy reproduction of our evaluation results, we list the evaluation tools an
60
 
61
  | | Evaluation setting | Metric | Evaluation tool |
62
  |------------|--------------------|-------------------------------------|-----------------|
63
- | KMMLU | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
64
- | KMMLU Hard | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
65
  | KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
66
- | Belebele | 0-shot | acc | lm-eval-harness |
67
- | CSATQA | 0-shot | acc\_norm | lm-eval-harness |
68
- | MMLU | 5-shot | macro\_avg / acc | lm-eval-harness |
69
- | MMLU Pro | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
70
  | GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
71
 
72
  ## Quickstart
 
60
 
61
  | | Evaluation setting | Metric | Evaluation tool |
62
  |------------|--------------------|-------------------------------------|-----------------|
63
+ | KMMLU | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
64
+ | KMMLU Hard | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
65
  | KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
66
+ | Belebele | 0-shot | acc | lm-eval-harness |
67
+ | CSATQA | 0-shot | acc\_norm | lm-eval-harness |
68
+ | MMLU | 5-shot | macro\_avg / acc | lm-eval-harness |
69
+ | MMLU Pro | 5-shot | macro\_avg / exact\_match | lm-eval-harness |
70
  | GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
71
 
72
  ## Quickstart