Commit
9842eb5
1 Parent(s): 6323cf5

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (6f11844951e917f4fe3d5261e7a3fc73b56283be)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +123 -8
README.md CHANGED
@@ -1,17 +1,119 @@
1
  ---
 
 
 
 
 
 
2
  base_model:
3
  - decruz07/kellemar-DPO-Orca-Distilled-7B-SLERP
4
  - mlabonne/NeuralMarcoro14-7B
5
  - fblgit/UNA-TheBeagle-7b-v1
6
  - SanjiWatsuki/Lelantos-DPO-7B
7
  - mistralai/Mistral-7B-v0.1
8
- tags:
9
- - mistral
10
- - merge
11
- license: apache-2.0
12
- language:
13
- - en
14
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
  # Macaroni 7b Tied
17
 
@@ -45,4 +147,17 @@ The users of this model (hereinafter referred to as "the Model") should be aware
45
 
46
  + Use at Your Own Risk: The Model is provided "as is," and the developers make no representations or warranties of any kind concerning the Model's performance or suitability for any particular purpose. The user assumes full responsibility and risk of loss resulting from using the Model.
47
 
48
- By using the Model, users acknowledge and agree to the terms stated in this disclaimer. This disclaimer is subject to change without notice, and the latest version can be found on the Model's Hugging Face page.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - mistral
7
+ - merge
8
  base_model:
9
  - decruz07/kellemar-DPO-Orca-Distilled-7B-SLERP
10
  - mlabonne/NeuralMarcoro14-7B
11
  - fblgit/UNA-TheBeagle-7b-v1
12
  - SanjiWatsuki/Lelantos-DPO-7B
13
  - mistralai/Mistral-7B-v0.1
14
+ model-index:
15
+ - name: Macaroni-7b-Tied
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: AI2 Reasoning Challenge (25-Shot)
22
+ type: ai2_arc
23
+ config: ARC-Challenge
24
+ split: test
25
+ args:
26
+ num_few_shot: 25
27
+ metrics:
28
+ - type: acc_norm
29
+ value: 72.87
30
+ name: normalized accuracy
31
+ source:
32
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
33
+ name: Open LLM Leaderboard
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: HellaSwag (10-Shot)
39
+ type: hellaswag
40
+ split: validation
41
+ args:
42
+ num_few_shot: 10
43
+ metrics:
44
+ - type: acc_norm
45
+ value: 88.14
46
+ name: normalized accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: MMLU (5-Shot)
55
+ type: cais/mmlu
56
+ config: all
57
+ split: test
58
+ args:
59
+ num_few_shot: 5
60
+ metrics:
61
+ - type: acc
62
+ value: 64.73
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: TruthfulQA (0-shot)
72
+ type: truthful_qa
73
+ config: multiple_choice
74
+ split: validation
75
+ args:
76
+ num_few_shot: 0
77
+ metrics:
78
+ - type: mc2
79
+ value: 70.54
80
+ source:
81
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Winogrande (5-shot)
88
+ type: winogrande
89
+ config: winogrande_xl
90
+ split: validation
91
+ args:
92
+ num_few_shot: 5
93
+ metrics:
94
+ - type: acc
95
+ value: 81.93
96
+ name: accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: GSM8k (5-shot)
105
+ type: gsm8k
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 71.57
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/Macaroni-7b-Tied
116
+ name: Open LLM Leaderboard
117
  ---
118
  # Macaroni 7b Tied
119
 
 
147
 
148
  + Use at Your Own Risk: The Model is provided "as is," and the developers make no representations or warranties of any kind concerning the Model's performance or suitability for any particular purpose. The user assumes full responsibility and risk of loss resulting from using the Model.
149
 
150
+ By using the Model, users acknowledge and agree to the terms stated in this disclaimer. This disclaimer is subject to change without notice, and the latest version can be found on the Model's Hugging Face page.
151
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
152
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_andrijdavid__Macaroni-7b-Tied)
153
+
154
+ | Metric |Value|
155
+ |---------------------------------|----:|
156
+ |Avg. |74.96|
157
+ |AI2 Reasoning Challenge (25-Shot)|72.87|
158
+ |HellaSwag (10-Shot) |88.14|
159
+ |MMLU (5-Shot) |64.73|
160
+ |TruthfulQA (0-shot) |70.54|
161
+ |Winogrande (5-shot) |81.93|
162
+ |GSM8k (5-shot) |71.57|
163
+