leaderboard-pr-bot commited on
Commit
af2d236
1 Parent(s): f45d035

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +121 -5
README.md CHANGED
@@ -1,8 +1,4 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Open-Orca/OpenOrca
5
- - OpenAssistant/oasst_top1_2023-08-25
6
  language:
7
  - bg
8
  - ca
@@ -24,8 +20,114 @@ language:
24
  - sr
25
  - sv
26
  - uk
27
-
28
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ---
30
 
31
 
@@ -161,3 +263,17 @@ outputs = model.generate(generation_config=generation_config,
161
  input_ids=inputs,)
162
  tokenizer.decode(outputs[0], skip_special_tokens=False) #True
163
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - bg
4
  - ca
 
20
  - sr
21
  - sv
22
  - uk
23
+ license: apache-2.0
24
  library_name: transformers
25
+ datasets:
26
+ - Open-Orca/OpenOrca
27
+ - OpenAssistant/oasst_top1_2023-08-25
28
+ model-index:
29
+ - name: Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
30
+ results:
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: AI2 Reasoning Challenge (25-Shot)
36
+ type: ai2_arc
37
+ config: ARC-Challenge
38
+ split: test
39
+ args:
40
+ num_few_shot: 25
41
+ metrics:
42
+ - type: acc_norm
43
+ value: 60.49
44
+ name: normalized accuracy
45
+ source:
46
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
47
+ name: Open LLM Leaderboard
48
+ - task:
49
+ type: text-generation
50
+ name: Text Generation
51
+ dataset:
52
+ name: HellaSwag (10-Shot)
53
+ type: hellaswag
54
+ split: validation
55
+ args:
56
+ num_few_shot: 10
57
+ metrics:
58
+ - type: acc_norm
59
+ value: 82.07
60
+ name: normalized accuracy
61
+ source:
62
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
63
+ name: Open LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: MMLU (5-Shot)
69
+ type: cais/mmlu
70
+ config: all
71
+ split: test
72
+ args:
73
+ num_few_shot: 5
74
+ metrics:
75
+ - type: acc
76
+ value: 62.34
77
+ name: accuracy
78
+ source:
79
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
80
+ name: Open LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: TruthfulQA (0-shot)
86
+ type: truthful_qa
87
+ config: multiple_choice
88
+ split: validation
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: mc2
93
+ value: 46.38
94
+ source:
95
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
96
+ name: Open LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: Winogrande (5-shot)
102
+ type: winogrande
103
+ config: winogrande_xl
104
+ split: validation
105
+ args:
106
+ num_few_shot: 5
107
+ metrics:
108
+ - type: acc
109
+ value: 78.45
110
+ name: accuracy
111
+ source:
112
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
113
+ name: Open LLM Leaderboard
114
+ - task:
115
+ type: text-generation
116
+ name: Text Generation
117
+ dataset:
118
+ name: GSM8k (5-shot)
119
+ type: gsm8k
120
+ config: main
121
+ split: test
122
+ args:
123
+ num_few_shot: 5
124
+ metrics:
125
+ - type: acc
126
+ value: 40.18
127
+ name: accuracy
128
+ source:
129
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2
130
+ name: Open LLM Leaderboard
131
  ---
132
 
133
 
 
263
  input_ids=inputs,)
264
  tokenizer.decode(outputs[0], skip_special_tokens=False) #True
265
  ```
266
+
267
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
268
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_NickyNicky__Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2)
269
+
270
+ | Metric |Value|
271
+ |---------------------------------|----:|
272
+ |Avg. |61.65|
273
+ |AI2 Reasoning Challenge (25-Shot)|60.49|
274
+ |HellaSwag (10-Shot) |82.07|
275
+ |MMLU (5-Shot) |62.34|
276
+ |TruthfulQA (0-shot) |46.38|
277
+ |Winogrande (5-shot) |78.45|
278
+ |GSM8k (5-shot) |40.18|
279
+