Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Sean Cho
commited on
Commit
•
b3e7847
1
Parent(s):
4c0ff9d
text style
Browse files
src/assets/text_content.py
CHANGED
@@ -10,7 +10,7 @@ The data used for evaluation consists of datasets to assess expertise, inference
|
|
10 |
The evaluation dataset is exclusively private and only available for evaluation process.
|
11 |
More detailed information about the benchmark dataset is provided on the “About” page.
|
12 |
|
13 |
-
This leaderboard is co-hosted by
|
14 |
"""
|
15 |
|
16 |
LLM_BENCHMARKS_TEXT = f"""
|
@@ -31,13 +31,13 @@ Please provide information about the model through an issue! 🤩
|
|
31 |
## How it works
|
32 |
|
33 |
📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
|
34 |
-
- Ko-HellaSwag (provided by
|
35 |
-
- Ko-MMLU (provided by
|
36 |
-
- Ko-Arc (provided by
|
37 |
-
- Ko-Truthful QA (provided by
|
38 |
To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
|
39 |
|
40 |
-
GPUs are provided by
|
41 |
|
42 |
## Details and Logs
|
43 |
- Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results
|
|
|
10 |
The evaluation dataset is exclusively private and only available for evaluation process.
|
11 |
More detailed information about the benchmark dataset is provided on the “About” page.
|
12 |
|
13 |
+
This leaderboard is co-hosted by __Upstage__ and __NIA__, and operated by __Upstage__.
|
14 |
"""
|
15 |
|
16 |
LLM_BENCHMARKS_TEXT = f"""
|
|
|
31 |
## How it works
|
32 |
|
33 |
📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
|
34 |
+
- Ko-HellaSwag (provided by __Upstage__)
|
35 |
+
- Ko-MMLU (provided by __Upstage__)
|
36 |
+
- Ko-Arc (provided by __Upstage__)
|
37 |
+
- Ko-Truthful QA (provided by __Upstage__)
|
38 |
To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
|
39 |
|
40 |
+
GPUs are provided by __KT__ for the evaluations.
|
41 |
|
42 |
## Details and Logs
|
43 |
- Detailed numerical results in the `results` Upstage dataset: https://huggingface.co/datasets/open-ko-llm-leaderboard/results
|