leaderboard-pr-bot commited on
Commit
0708e21
1 Parent(s): b6d3a6a

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +126 -7
README.md CHANGED
@@ -9,19 +9,124 @@ tags:
9
  - trl
10
  - sft
11
  base_model: meta-llama/Meta-Llama-3-8B
12
-
13
  extra_gated_fields:
14
  Name: text
15
  Company: text
16
  Country: country
17
  I want to use this model for:
18
  type: select
19
- options:
20
- - Research
21
- - Education
22
- - label: Other
23
- value: other
24
- You agree to not use the model to conduct experiments that cause harm to human subjects or use it to obtain illeagal knowladge and I also agree to use this model for non-commercial use ONLY: checkbox
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
 
27
  [<img src="https://ai.hooking.co.il/upload/images/logo/0qUf-dashboard-hookingai-logo.png"/>](https://software.hooking.ltd/)
@@ -128,3 +233,17 @@ The model is available under the Apache-2.0 license.
128
  year={2024},
129
  publisher={Hooking}
130
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - trl
10
  - sft
11
  base_model: meta-llama/Meta-Llama-3-8B
 
12
  extra_gated_fields:
13
  Name: text
14
  Company: text
15
  Country: country
16
  I want to use this model for:
17
  type: select
18
+ options:
19
+ - Research
20
+ - Education
21
+ - label: Other
22
+ value: other
23
+ ? You agree to not use the model to conduct experiments that cause harm to human
24
+ subjects or use it to obtain illeagal knowladge and I also agree to use this model
25
+ for non-commercial use ONLY
26
+ : checkbox
27
+ model-index:
28
+ - name: Monah-8b
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: AI2 Reasoning Challenge (25-Shot)
35
+ type: ai2_arc
36
+ config: ARC-Challenge
37
+ split: test
38
+ args:
39
+ num_few_shot: 25
40
+ metrics:
41
+ - type: acc_norm
42
+ value: 58.87
43
+ name: normalized accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: HellaSwag (10-Shot)
52
+ type: hellaswag
53
+ split: validation
54
+ args:
55
+ num_few_shot: 10
56
+ metrics:
57
+ - type: acc_norm
58
+ value: 80.7
59
+ name: normalized accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: MMLU (5-Shot)
68
+ type: cais/mmlu
69
+ config: all
70
+ split: test
71
+ args:
72
+ num_few_shot: 5
73
+ metrics:
74
+ - type: acc
75
+ value: 64.69
76
+ name: accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: TruthfulQA (0-shot)
85
+ type: truthful_qa
86
+ config: multiple_choice
87
+ split: validation
88
+ args:
89
+ num_few_shot: 0
90
+ metrics:
91
+ - type: mc2
92
+ value: 43.2
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: Winogrande (5-shot)
101
+ type: winogrande
102
+ config: winogrande_xl
103
+ split: validation
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 76.64
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: GSM8k (5-shot)
118
+ type: gsm8k
119
+ config: main
120
+ split: test
121
+ args:
122
+ num_few_shot: 5
123
+ metrics:
124
+ - type: acc
125
+ value: 42.61
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=hooking-dev/Monah-8b
129
+ name: Open LLM Leaderboard
130
  ---
131
 
132
  [<img src="https://ai.hooking.co.il/upload/images/logo/0qUf-dashboard-hookingai-logo.png"/>](https://software.hooking.ltd/)
 
233
  year={2024},
234
  publisher={Hooking}
235
  }
236
+
237
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
238
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_hooking-dev__Monah-8b)
239
+
240
+ | Metric |Value|
241
+ |---------------------------------|----:|
242
+ |Avg. |61.12|
243
+ |AI2 Reasoning Challenge (25-Shot)|58.87|
244
+ |HellaSwag (10-Shot) |80.70|
245
+ |MMLU (5-Shot) |64.69|
246
+ |TruthfulQA (0-shot) |43.20|
247
+ |Winogrande (5-shot) |76.64|
248
+ |GSM8k (5-shot) |42.61|
249
+