leaderboard-pr-bot commited on
Commit
5d00868
1 Parent(s): eb04cac

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +140 -8
README.md CHANGED
@@ -3,15 +3,133 @@ license: other
3
  license_name: yi-license
4
  license_link: LICENSE
5
  widget:
6
- - example_title: "Yi-34B-Chat"
7
- text: "hi"
8
- output:
9
- text: " Hello! How can I assist you today?"
10
- - example_title: "Yi-34B"
11
- text: "There's a place where time stands still. A place of breath taking wonder, but also"
12
- output:
13
- text: " an eerie sense that something is just not right…\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  <div align="center">
@@ -1162,3 +1280,17 @@ For free commercial use, you only need to send an email to [get official commerc
1162
  <p align="right"> [
1163
  <a href="#top">Back to top ⬆️ </a> ]
1164
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: yi-license
4
  license_link: LICENSE
5
  widget:
6
+ - example_title: Yi-34B-Chat
7
+ text: hi
8
+ output:
9
+ text: ' Hello! How can I assist you today?'
10
+ - example_title: Yi-34B
11
+ text: There's a place where time stands still. A place of breath taking wonder,
12
+ but also
13
+ output:
14
+ text: ' an eerie sense that something is just not right…
15
+
16
+ Between the two worlds lies The Forgotten Kingdom - home to creatures long since
17
+ thought extinct and ancient magic so strong it defies belief! Only here can
18
+ you find what has been lost for centuries: An Elixir Of Life which will restore
19
+ youth and vitality if only those who seek its power are brave enough to face
20
+ up against all manner of dangers lurking in this mysterious land! But beware;
21
+ some say there may even exist powerful entities beyond our comprehension whose
22
+ intentions towards humanity remain unclear at best ---- they might want nothing
23
+ more than destruction itself rather then anything else from their quest after
24
+ immortality (and maybe someone should tell them about modern medicine)? In any
25
+ event though – one thing remains true regardless : whether or not success comes
26
+ easy depends entirely upon how much effort we put into conquering whatever challenges
27
+ lie ahead along with having faith deep down inside ourselves too ;) So let’s
28
+ get started now shall We?'
29
  pipeline_tag: text-generation
30
+ model-index:
31
+ - name: Yi-34B-Chat
32
+ results:
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: AI2 Reasoning Challenge (25-Shot)
38
+ type: ai2_arc
39
+ config: ARC-Challenge
40
+ split: test
41
+ args:
42
+ num_few_shot: 25
43
+ metrics:
44
+ - type: acc_norm
45
+ value: 65.44
46
+ name: normalized accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: HellaSwag (10-Shot)
55
+ type: hellaswag
56
+ split: validation
57
+ args:
58
+ num_few_shot: 10
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 84.16
62
+ name: normalized accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: MMLU (5-Shot)
71
+ type: cais/mmlu
72
+ config: all
73
+ split: test
74
+ args:
75
+ num_few_shot: 5
76
+ metrics:
77
+ - type: acc
78
+ value: 74.9
79
+ name: accuracy
80
+ source:
81
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: TruthfulQA (0-shot)
88
+ type: truthful_qa
89
+ config: multiple_choice
90
+ split: validation
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: mc2
95
+ value: 55.37
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: Winogrande (5-shot)
104
+ type: winogrande
105
+ config: winogrande_xl
106
+ split: validation
107
+ args:
108
+ num_few_shot: 5
109
+ metrics:
110
+ - type: acc
111
+ value: 80.11
112
+ name: accuracy
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
115
+ name: Open LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: GSM8k (5-shot)
121
+ type: gsm8k
122
+ config: main
123
+ split: test
124
+ args:
125
+ num_few_shot: 5
126
+ metrics:
127
+ - type: acc
128
+ value: 31.92
129
+ name: accuracy
130
+ source:
131
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=01-ai/Yi-34B-Chat
132
+ name: Open LLM Leaderboard
133
  ---
134
 
135
  <div align="center">
 
1280
  <p align="right"> [
1281
  <a href="#top">Back to top ⬆️ </a> ]
1282
  </p>
1283
+
1284
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
1285
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_01-ai__Yi-34B-Chat)
1286
+
1287
+ | Metric |Value|
1288
+ |---------------------------------|----:|
1289
+ |Avg. |65.32|
1290
+ |AI2 Reasoning Challenge (25-Shot)|65.44|
1291
+ |HellaSwag (10-Shot) |84.16|
1292
+ |MMLU (5-Shot) |74.90|
1293
+ |TruthfulQA (0-shot) |55.37|
1294
+ |Winogrande (5-shot) |80.11|
1295
+ |GSM8k (5-shot) |31.92|
1296
+