Sean Cho commited on
Commit
94a1689
Β·
1 Parent(s): a507ee8

update about

Browse files
Files changed (1) hide show
  1. src/assets/text_content.py +7 -6
src/assets/text_content.py CHANGED
@@ -31,12 +31,13 @@ Please provide information about the model through an issue! 🀩
31
 
32
  ## How it works
33
 
34
- πŸ“ˆ We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
35
- - Ko-HellaSwag (provided by __Upstage__)
36
- - Ko-MMLU (provided by __Upstage__)
37
- - Ko-Arc (provided by __Upstage__)
38
- - Ko-Truthful QA (provided by __Upstage__)
39
- To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
 
40
 
41
  GPUs are provided by __KT__ for the evaluations.
42
 
 
31
 
32
  ## How it works
33
 
34
+ πŸ“ˆ We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM. We have also added a new dataset prepared from scratch.
35
+ - Ko-HellaSwag (provided by __Upstage__, machine translation)
36
+ - Ko-MMLU (provided by __Upstage__, human translation and variation)
37
+ - Ko-Arc (provided by __Upstage__, human translation and variation)
38
+ - Ko-Truthful QA (provided by __Upstage__, human translation and variation)
39
+ - Ko-CommonGen V2 (provided by __Korea University NLP&AI Lab__, created from scratch)
40
+ To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing these elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from each evaluation datasets.
41
 
42
  GPUs are provided by __KT__ for the evaluations.
43