Benchmark links

#60
by oiwio - opened

May you share the link to the evaluation benchmark that is capable of running ko-ARC, ko-Hellaswag. and ko-CommonGen?

upstage org

Hi. For Ko-H5, we didn't release the test data, so there is no link to evaluate it.
However, for English, you can evaluate it through lm-evaluation-harness and there is a Korean benchmark similar to Ko-h5 at https://huggingface.co/datasets/HAERAE-HUB/KMMLU

Sign up or log in to comment