leaderboard-pr-bot commited on
Commit
f607016
1 Parent(s): 4cced00

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +127 -11
README.md CHANGED
@@ -1,4 +1,14 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  datasets:
3
  - Open-Orca/SlimOrca-Dedup
4
  - teknium/openhermes
@@ -10,19 +20,112 @@ datasets:
10
  - LeoLM/OpenSchnabeltier
11
  - bjoernp/ultrachat_de
12
  - LDJnr/Capybara
13
- language:
14
- - en
15
- - de
16
- library_name: transformers
17
  pipeline_tag: text-generation
18
- license: llama2
19
  model_creator: DiscoResearch
20
  model_type: llama
21
- tags:
22
- - goliath
23
- - deutsch
24
- - llama2
25
- - discoresearch
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
 
28
 
@@ -183,4 +286,17 @@ We are standing on the shoulders of giants; many thanks in no particular order t
183
  ## Disclaimer
184
 
185
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model.
186
- This model should only be used for research purposes. The original Llama2 license and all restrictions of datasets used to train this model apply.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - de
5
+ license: llama2
6
+ library_name: transformers
7
+ tags:
8
+ - goliath
9
+ - deutsch
10
+ - llama2
11
+ - discoresearch
12
  datasets:
13
  - Open-Orca/SlimOrca-Dedup
14
  - teknium/openhermes
 
20
  - LeoLM/OpenSchnabeltier
21
  - bjoernp/ultrachat_de
22
  - LDJnr/Capybara
 
 
 
 
23
  pipeline_tag: text-generation
 
24
  model_creator: DiscoResearch
25
  model_type: llama
26
+ model-index:
27
+ - name: DiscoLM-70b
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 68.77
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 86.1
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 68.58
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 57.64
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 83.58
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 63.53
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=DiscoResearch/DiscoLM-70b
128
+ name: Open LLM Leaderboard
129
  ---
130
 
131
 
 
286
  ## Disclaimer
287
 
288
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model.
289
+ This model should only be used for research purposes. The original Llama2 license and all restrictions of datasets used to train this model apply.
290
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
291
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DiscoResearch__DiscoLM-70b)
292
+
293
+ | Metric |Value|
294
+ |---------------------------------|----:|
295
+ |Avg. |71.37|
296
+ |AI2 Reasoning Challenge (25-Shot)|68.77|
297
+ |HellaSwag (10-Shot) |86.10|
298
+ |MMLU (5-Shot) |68.58|
299
+ |TruthfulQA (0-shot) |57.64|
300
+ |Winogrande (5-shot) |83.58|
301
+ |GSM8k (5-shot) |63.53|
302
+