Commit
78cdc44
1 Parent(s): cf19bec

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (cea24e58b3fc6a211b230935f3558a5d314e0055)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +127 -11
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Intel/orca_dpo_pairs
5
- - Locutusque/Hercules-v3.0
6
  language:
7
  - en
 
8
  tags:
9
  - conversational
 
 
 
10
  inference:
11
  parameters:
12
  do_sample: true
@@ -17,12 +17,115 @@ inference:
17
  max_new_tokens: 250
18
  repetition_penalty: 1.1
19
  widget:
20
- - text: Hello who are you?
21
- example_title: Identity
22
- - text: What can you do?
23
- example_title: Capabilities
24
- - text: Create a fastapi endpoint to retrieve the weather given a zip code.
25
- example_title: Coding
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
  # NeuralReyna-Mini-1.8B-v0.2
28
  ![Reyna image](https://th.bing.com/th/id/OIG3.8IBxuT77hh6Y_r1DZ6WK?dpr=2.6&pid=ImgDetMain)
@@ -59,4 +162,17 @@ This model may have overfitted to the DPO training data, and may not perform wel
59
 
60
  # Contributions
61
 
62
- Thanks to @aloobun and @Locutusque for their contributions to this model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  tags:
6
  - conversational
7
+ datasets:
8
+ - Intel/orca_dpo_pairs
9
+ - Locutusque/Hercules-v3.0
10
  inference:
11
  parameters:
12
  do_sample: true
 
17
  max_new_tokens: 250
18
  repetition_penalty: 1.1
19
  widget:
20
+ - text: Hello who are you?
21
+ example_title: Identity
22
+ - text: What can you do?
23
+ example_title: Capabilities
24
+ - text: Create a fastapi endpoint to retrieve the weather given a zip code.
25
+ example_title: Coding
26
+ model-index:
27
+ - name: NeuralReyna-Mini-1.8B-v0.2
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: AI2 Reasoning Challenge (25-Shot)
34
+ type: ai2_arc
35
+ config: ARC-Challenge
36
+ split: test
37
+ args:
38
+ num_few_shot: 25
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 37.8
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: HellaSwag (10-Shot)
51
+ type: hellaswag
52
+ split: validation
53
+ args:
54
+ num_few_shot: 10
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 60.51
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MMLU (5-Shot)
67
+ type: cais/mmlu
68
+ config: all
69
+ split: test
70
+ args:
71
+ num_few_shot: 5
72
+ metrics:
73
+ - type: acc
74
+ value: 45.04
75
+ name: accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: TruthfulQA (0-shot)
84
+ type: truthful_qa
85
+ config: multiple_choice
86
+ split: validation
87
+ args:
88
+ num_few_shot: 0
89
+ metrics:
90
+ - type: mc2
91
+ value: 37.75
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: Winogrande (5-shot)
100
+ type: winogrande
101
+ config: winogrande_xl
102
+ split: validation
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 60.93
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: GSM8k (5-shot)
117
+ type: gsm8k
118
+ config: main
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ value: 27.07
125
+ name: accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/NeuralReyna-Mini-1.8B-v0.2
128
+ name: Open LLM Leaderboard
129
  ---
130
  # NeuralReyna-Mini-1.8B-v0.2
131
  ![Reyna image](https://th.bing.com/th/id/OIG3.8IBxuT77hh6Y_r1DZ6WK?dpr=2.6&pid=ImgDetMain)
 
162
 
163
  # Contributions
164
 
165
+ Thanks to @aloobun and @Locutusque for their contributions to this model.
166
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
167
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_M4-ai__NeuralReyna-Mini-1.8B-v0.2)
168
+
169
+ | Metric |Value|
170
+ |---------------------------------|----:|
171
+ |Avg. |44.85|
172
+ |AI2 Reasoning Challenge (25-Shot)|37.80|
173
+ |HellaSwag (10-Shot) |60.51|
174
+ |MMLU (5-Shot) |45.04|
175
+ |TruthfulQA (0-shot) |37.75|
176
+ |Winogrande (5-shot) |60.93|
177
+ |GSM8k (5-shot) |27.07|
178
+