nielsr HF staff leaderboard-pr-bot commited on
Commit
bdd31cf
1 Parent(s): 59b1af1

Adding Evaluation Results (#10)

Browse files

- Adding Evaluation Results (a053b485ed61758e2ca4acb5ed6bb3a33c5d272d)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,11 +1,11 @@
1
  ---
2
  license: apache-2.0
3
- base_model: Intel/neural-chat-7b-v3-1
4
  tags:
5
  - LLMs
6
  - mistral
7
  - math
8
  - Intel
 
9
  model-index:
10
  - name: neural-chat-7b-v3-3
11
  results:
@@ -13,8 +13,8 @@ model-index:
13
  type: Large Language Model
14
  name: Large Language Model
15
  dataset:
16
- type: meta-math/MetaMathQA
17
  name: meta-math/MetaMathQA
 
18
  metrics:
19
  - type: ARC (25-shot)
20
  value: 66.89
@@ -40,6 +40,106 @@ model-index:
40
  value: 61.11
41
  name: GSM8K (5-shot)
42
  verified: true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ---
44
 
45
  ## Model Details: Neural-Chat-v3-3
@@ -243,4 +343,17 @@ Here are a couple of useful links to learn more about Intel's AI software:
243
 
244
  ## Disclaimer
245
 
246
- The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - LLMs
5
  - mistral
6
  - math
7
  - Intel
8
+ base_model: Intel/neural-chat-7b-v3-1
9
  model-index:
10
  - name: neural-chat-7b-v3-3
11
  results:
 
13
  type: Large Language Model
14
  name: Large Language Model
15
  dataset:
 
16
  name: meta-math/MetaMathQA
17
+ type: meta-math/MetaMathQA
18
  metrics:
19
  - type: ARC (25-shot)
20
  value: 66.89
 
40
  value: 61.11
41
  name: GSM8K (5-shot)
42
  verified: true
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: AI2 Reasoning Challenge (25-Shot)
48
+ type: ai2_arc
49
+ config: ARC-Challenge
50
+ split: test
51
+ args:
52
+ num_few_shot: 25
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 66.89
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: HellaSwag (10-Shot)
65
+ type: hellaswag
66
+ split: validation
67
+ args:
68
+ num_few_shot: 10
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 85.26
72
+ name: normalized accuracy
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: MMLU (5-Shot)
81
+ type: cais/mmlu
82
+ config: all
83
+ split: test
84
+ args:
85
+ num_few_shot: 5
86
+ metrics:
87
+ - type: acc
88
+ value: 63.07
89
+ name: accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: TruthfulQA (0-shot)
98
+ type: truthful_qa
99
+ config: multiple_choice
100
+ split: validation
101
+ args:
102
+ num_few_shot: 0
103
+ metrics:
104
+ - type: mc2
105
+ value: 63.01
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: Winogrande (5-shot)
114
+ type: winogrande
115
+ config: winogrande_xl
116
+ split: validation
117
+ args:
118
+ num_few_shot: 5
119
+ metrics:
120
+ - type: acc
121
+ value: 79.64
122
+ name: accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
125
+ name: Open LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: GSM8k (5-shot)
131
+ type: gsm8k
132
+ config: main
133
+ split: test
134
+ args:
135
+ num_few_shot: 5
136
+ metrics:
137
+ - type: acc
138
+ value: 61.11
139
+ name: accuracy
140
+ source:
141
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Intel/neural-chat-7b-v3-3
142
+ name: Open LLM Leaderboard
143
  ---
144
 
145
  ## Model Details: Neural-Chat-v3-3
 
343
 
344
  ## Disclaimer
345
 
346
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
347
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
348
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Intel__neural-chat-7b-v3-3)
349
+
350
+ | Metric |Value|
351
+ |---------------------------------|----:|
352
+ |Avg. |69.83|
353
+ |AI2 Reasoning Challenge (25-Shot)|66.89|
354
+ |HellaSwag (10-Shot) |85.26|
355
+ |MMLU (5-Shot) |63.07|
356
+ |TruthfulQA (0-shot) |63.01|
357
+ |Winogrande (5-shot) |79.64|
358
+ |GSM8k (5-shot) |61.11|
359
+