Spaces:

upstage
/

open-ko-llm-leaderboard

Running on CPU Upgrade

Sean Cho commited on Sep 16, 2023

Commit

b3e7847

•

1 Parent(s): 4c0ff9d

text style

Files changed (1) hide show

src/assets/text_content.py CHANGED Viewed

@@ -10,7 +10,7 @@ The data used for evaluation consists of datasets to assess expertise, inference
 The evaluation dataset is exclusively private and only available for evaluation process.
 More detailed information about the benchmark dataset is provided on the “About” page.
-This leaderboard is co-hosted by Upstage and NIA, and operated by Upstage.
 """
 LLM_BENCHMARKS_TEXT = f"""
@@ -31,13 +31,13 @@ Please provide information about the model through an issue! 🤩
 ## How it works
 📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
-- Ko-HellaSwag (provided by Upstage)
-- Ko-MMLU (provided by Upstage)
-- Ko-Arc (provided by Upstage)
-- Ko-Truthful QA (provided by Upstage)
 To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
-GPUs are provided by KT for the evaluations.
 ## Details and Logs
 - Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results

 The evaluation dataset is exclusively private and only available for evaluation process.
 More detailed information about the benchmark dataset is provided on the “About” page.
+This leaderboard is co-hosted by __Upstage__ and __NIA__, and operated by __Upstage__.
 """
 LLM_BENCHMARKS_TEXT = f"""
 ## How it works
 📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
+- Ko-HellaSwag (provided by __Upstage__)
+- Ko-MMLU (provided by __Upstage__)
+- Ko-Arc (provided by __Upstage__)
+- Ko-Truthful QA (provided by __Upstage__)
 To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
+GPUs are provided by __KT__ for the evaluations.
 ## Details and Logs
 - Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results