mlabonne leaderboard-pr-bot commited on
Commit
cd21260
1 Parent(s): 5a66774

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (16e6e34dee9f558aa756742a269199c481743fed)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -3,6 +3,109 @@ license: apache-2.0
3
  tags:
4
  - merge
5
  - mergekit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  # NeuralPipe-7B
@@ -69,4 +172,17 @@ Output:
69
 
70
  ```
71
  A large language model is an AI system that uses deep learning techniques to process and understand vast amounts of natural language data. It is designed to generate human-like text, perform complex language tasks, and understand the context, nuance, and meaning of textual data. These models are trained on large datasets, often including billions of words, to learn the patterns and relationships in language. As a result, they can generate coherent and contextually relevant text, answer questions, and perform a variety of other language-related tasks. Some well-known large language models include OpenAI's GPT-3, Google's BERT, and Facebook's RoBERTa.
72
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
  - merge
5
  - mergekit
6
+ model-index:
7
+ - name: NeuralPipe-7B-slerp
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: AI2 Reasoning Challenge (25-Shot)
14
+ type: ai2_arc
15
+ config: ARC-Challenge
16
+ split: test
17
+ args:
18
+ num_few_shot: 25
19
+ metrics:
20
+ - type: acc_norm
21
+ value: 67.75
22
+ name: normalized accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: HellaSwag (10-Shot)
31
+ type: hellaswag
32
+ split: validation
33
+ args:
34
+ num_few_shot: 10
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 86.15
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: MMLU (5-Shot)
47
+ type: cais/mmlu
48
+ config: all
49
+ split: test
50
+ args:
51
+ num_few_shot: 5
52
+ metrics:
53
+ - type: acc
54
+ value: 63.94
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: TruthfulQA (0-shot)
64
+ type: truthful_qa
65
+ config: multiple_choice
66
+ split: validation
67
+ args:
68
+ num_few_shot: 0
69
+ metrics:
70
+ - type: mc2
71
+ value: 59.8
72
+ source:
73
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Winogrande (5-shot)
80
+ type: winogrande
81
+ config: winogrande_xl
82
+ split: validation
83
+ args:
84
+ num_few_shot: 5
85
+ metrics:
86
+ - type: acc
87
+ value: 79.64
88
+ name: accuracy
89
+ source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: GSM8k (5-shot)
97
+ type: gsm8k
98
+ config: main
99
+ split: test
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 69.75
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralPipe-7B-slerp
108
+ name: Open LLM Leaderboard
109
  ---
110
 
111
  # NeuralPipe-7B
 
172
 
173
  ```
174
  A large language model is an AI system that uses deep learning techniques to process and understand vast amounts of natural language data. It is designed to generate human-like text, perform complex language tasks, and understand the context, nuance, and meaning of textual data. These models are trained on large datasets, often including billions of words, to learn the patterns and relationships in language. As a result, they can generate coherent and contextually relevant text, answer questions, and perform a variety of other language-related tasks. Some well-known large language models include OpenAI's GPT-3, Google's BERT, and Facebook's RoBERTa.
175
+ ```
176
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
177
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__NeuralPipe-7B-slerp)
178
+
179
+ | Metric |Value|
180
+ |---------------------------------|----:|
181
+ |Avg. |71.17|
182
+ |AI2 Reasoning Challenge (25-Shot)|67.75|
183
+ |HellaSwag (10-Shot) |86.15|
184
+ |MMLU (5-Shot) |63.94|
185
+ |TruthfulQA (0-shot) |59.80|
186
+ |Winogrande (5-shot) |79.64|
187
+ |GSM8k (5-shot) |69.75|
188
+