Commit
7fb4e4e
1 Parent(s): e1c0fe2

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (68e2c89affc2e33b6406f91068e0330d073cba7d)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -1,10 +1,113 @@
1
  ---
 
 
 
2
  tags:
3
  - mistral
4
  - merge
5
- license: apache-2.0
6
- language:
7
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  # Macaroni 7B
10
 
@@ -26,3 +129,17 @@ This is an experimental merge of pre-trained mistral language models with fblgit
26
  * Ethical Use: Users are encouraged to use The Model ethically and responsibly. The creator(s) disclaim any responsibility for misuse or unethical use of The Model.
27
 
28
  * Modifications: Any modifications made to The Model by third parties are the sole responsibility of the party making the modifications. The original creator(s) of The Model shall not be responsible for any modifications made by third parties.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
  - mistral
7
  - merge
8
+ model-index:
9
+ - name: macaroni-7b
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: AI2 Reasoning Challenge (25-Shot)
16
+ type: ai2_arc
17
+ config: ARC-Challenge
18
+ split: test
19
+ args:
20
+ num_few_shot: 25
21
+ metrics:
22
+ - type: acc_norm
23
+ value: 73.12
24
+ name: normalized accuracy
25
+ source:
26
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
27
+ name: Open LLM Leaderboard
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: HellaSwag (10-Shot)
33
+ type: hellaswag
34
+ split: validation
35
+ args:
36
+ num_few_shot: 10
37
+ metrics:
38
+ - type: acc_norm
39
+ value: 88.17
40
+ name: normalized accuracy
41
+ source:
42
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
43
+ name: Open LLM Leaderboard
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: MMLU (5-Shot)
49
+ type: cais/mmlu
50
+ config: all
51
+ split: test
52
+ args:
53
+ num_few_shot: 5
54
+ metrics:
55
+ - type: acc
56
+ value: 64.58
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: TruthfulQA (0-shot)
66
+ type: truthful_qa
67
+ config: multiple_choice
68
+ split: validation
69
+ args:
70
+ num_few_shot: 0
71
+ metrics:
72
+ - type: mc2
73
+ value: 68.76
74
+ source:
75
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
76
+ name: Open LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: Winogrande (5-shot)
82
+ type: winogrande
83
+ config: winogrande_xl
84
+ split: validation
85
+ args:
86
+ num_few_shot: 5
87
+ metrics:
88
+ - type: acc
89
+ value: 84.37
90
+ name: accuracy
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: GSM8k (5-shot)
99
+ type: gsm8k
100
+ config: main
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 68.61
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=andrijdavid/macaroni-7b
110
+ name: Open LLM Leaderboard
111
  ---
112
  # Macaroni 7B
113
 
 
129
  * Ethical Use: Users are encouraged to use The Model ethically and responsibly. The creator(s) disclaim any responsibility for misuse or unethical use of The Model.
130
 
131
  * Modifications: Any modifications made to The Model by third parties are the sole responsibility of the party making the modifications. The original creator(s) of The Model shall not be responsible for any modifications made by third parties.
132
+
133
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
134
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_andrijdavid__macaroni-7b)
135
+
136
+ | Metric |Value|
137
+ |---------------------------------|----:|
138
+ |Avg. |74.60|
139
+ |AI2 Reasoning Challenge (25-Shot)|73.12|
140
+ |HellaSwag (10-Shot) |88.17|
141
+ |MMLU (5-Shot) |64.58|
142
+ |TruthfulQA (0-shot) |68.76|
143
+ |Winogrande (5-shot) |84.37|
144
+ |GSM8k (5-shot) |68.61|
145
+