Spaces:
Running
on
CPU Upgrade
Possibly include multi-lingual benchmarks like C-Eval and XCopa
This echo to the discussions in
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/24
about adding multilingual evaluation.
Would like to recommend C-Eval, which is a good Chinese knowledge evaluation suite similar to MMLU
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/24
as well as Xcopa, a good multilingual commonsense reasoning benchmark used in PaLM2 eval
https://github.com/cambridgeltl/xcopa
Would be awesome if open-llm-leaderboard could include these datasets!
@clefourrier Will you add this or can this be close?
Hi!
We won't add new multilingual evals to the Open LLM Leaderboard, but anyone wanting to start a multilingual leaderboard can ping me in this thread if needed. I'll close in the meantime.