Commit
28506b9
1 Parent(s): 4c0793e

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (c1a21523b282e1e7c1fc587d5ac5c908d058c1e2)
- Update README.md (e39d245239f253df2cad1b2598e2b429869a5a44)
- Update README.md (86cb4c41db25fa2b09bba1d7fcab58e72540ab6b)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +162 -48
README.md CHANGED
@@ -1,56 +1,53 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
3
  base_model: Felladrin/Minueza-32M-Base
4
  pipeline_tag: text-generation
5
- language:
6
- - en
7
- datasets:
8
- - HuggingFaceH4/ultrachat_200k
9
- - Felladrin/ChatML-ultrachat_200k
10
  widget:
11
- - messages:
12
- - role: system
13
- content: >-
14
- You are a career counselor. The user will provide you with an individual
15
- looking for guidance in their professional life, and your task is to assist
16
- them in determining what careers they are most suited for based on their skills,
17
- interests, and experience. You should also conduct research into the various
18
- options available, explain the job market trends in different industries, and
19
- advice on which qualifications would be beneficial for pursuing particular fields.
20
- - role: user
21
- content: Heya!
22
- - role: assistant
23
- content: Hi! How may I help you?
24
- - role: user
25
- content: >-
26
- I am interested in developing a career in software engineering. What
27
- would you recommend me to do?
28
- - messages:
29
- - role: user
30
- content: Morning!
31
- - role: assistant
32
- content: Good morning! How can I help you today?
33
- - role: user
34
- content: Could you give me some tips for becoming a healthier person?
35
- - messages:
36
- - role: user
37
- content: Write the specs of a game about mages in a fantasy world.
38
- - messages:
39
- - role: user
40
- content: Tell me about the pros and cons of social media.
41
- - messages:
42
- - role: system
43
- content: >-
44
- You are a highly knowledgeable and friendly assistant.
45
- Your goal is to understand and respond to user inquiries with clarity.
46
- Your interactions are always respectful, helpful, and focused on
47
- delivering the most accurate information to the user.
48
- - role: user
49
- content: Hey! Got a question for you!
50
- - role: assistant
51
- content: Sure! What's it?
52
- - role: user
53
- content: What are some potential applications for quantum computing?
54
  inference:
55
  parameters:
56
  max_new_tokens: 250
@@ -59,6 +56,109 @@ inference:
59
  top_p: 0.55
60
  top_k: 35
61
  repetition_penalty: 1.176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ---
63
 
64
  # Minueza-32M-UltraChat: A chat model with 32 million parameters
@@ -145,3 +245,17 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
145
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
146
  | Scheduler | cosine |
147
  | Seed | 42 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ datasets:
6
+ - HuggingFaceH4/ultrachat_200k
7
+ - Felladrin/ChatML-ultrachat_200k
8
  base_model: Felladrin/Minueza-32M-Base
9
  pipeline_tag: text-generation
 
 
 
 
 
10
  widget:
11
+ - messages:
12
+ - role: system
13
+ content: You are a career counselor. The user will provide you with an individual
14
+ looking for guidance in their professional life, and your task is to assist
15
+ them in determining what careers they are most suited for based on their skills,
16
+ interests, and experience. You should also conduct research into the various
17
+ options available, explain the job market trends in different industries, and
18
+ advice on which qualifications would be beneficial for pursuing particular fields.
19
+ - role: user
20
+ content: Heya!
21
+ - role: assistant
22
+ content: Hi! How may I help you?
23
+ - role: user
24
+ content: I am interested in developing a career in software engineering. What
25
+ would you recommend me to do?
26
+ - messages:
27
+ - role: user
28
+ content: Morning!
29
+ - role: assistant
30
+ content: Good morning! How can I help you today?
31
+ - role: user
32
+ content: Could you give me some tips for becoming a healthier person?
33
+ - messages:
34
+ - role: user
35
+ content: Write the specs of a game about mages in a fantasy world.
36
+ - messages:
37
+ - role: user
38
+ content: Tell me about the pros and cons of social media.
39
+ - messages:
40
+ - role: system
41
+ content: You are a highly knowledgeable and friendly assistant. Your goal is to
42
+ understand and respond to user inquiries with clarity. Your interactions are
43
+ always respectful, helpful, and focused on delivering the most accurate information
44
+ to the user.
45
+ - role: user
46
+ content: Hey! Got a question for you!
47
+ - role: assistant
48
+ content: Sure! What's it?
49
+ - role: user
50
+ content: What are some potential applications for quantum computing?
 
 
 
51
  inference:
52
  parameters:
53
  max_new_tokens: 250
 
56
  top_p: 0.55
57
  top_k: 35
58
  repetition_penalty: 1.176
59
+ model-index:
60
+ - name: Minueza-32M-UltraChat
61
+ results:
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: AI2 Reasoning Challenge (25-Shot)
67
+ type: ai2_arc
68
+ config: ARC-Challenge
69
+ split: test
70
+ args:
71
+ num_few_shot: 25
72
+ metrics:
73
+ - type: acc_norm
74
+ value: 21.08
75
+ name: normalized accuracy
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: HellaSwag (10-Shot)
84
+ type: hellaswag
85
+ split: validation
86
+ args:
87
+ num_few_shot: 10
88
+ metrics:
89
+ - type: acc_norm
90
+ value: 26.95
91
+ name: normalized accuracy
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: MMLU (5-Shot)
100
+ type: cais/mmlu
101
+ config: all
102
+ split: test
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 26.08
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
111
+ name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: TruthfulQA (0-shot)
117
+ type: truthful_qa
118
+ config: multiple_choice
119
+ split: validation
120
+ args:
121
+ num_few_shot: 0
122
+ metrics:
123
+ - type: mc2
124
+ value: 47.7
125
+ source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
127
+ name: Open LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: Winogrande (5-shot)
133
+ type: winogrande
134
+ config: winogrande_xl
135
+ split: validation
136
+ args:
137
+ num_few_shot: 5
138
+ metrics:
139
+ - type: acc
140
+ value: 51.78
141
+ name: accuracy
142
+ source:
143
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
144
+ name: Open LLM Leaderboard
145
+ - task:
146
+ type: text-generation
147
+ name: Text Generation
148
+ dataset:
149
+ name: GSM8k (5-shot)
150
+ type: gsm8k
151
+ config: main
152
+ split: test
153
+ args:
154
+ num_few_shot: 5
155
+ metrics:
156
+ - type: acc
157
+ value: 0.23
158
+ name: accuracy
159
+ source:
160
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Minueza-32M-UltraChat
161
+ name: Open LLM Leaderboard
162
  ---
163
 
164
  # Minueza-32M-UltraChat: A chat model with 32 million parameters
 
245
  | Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
246
  | Scheduler | cosine |
247
  | Seed | 42 |
248
+
249
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
250
+
251
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Minueza-32M-UltraChat)
252
+
253
+ | Metric |Value|
254
+ |---------------------------------|----:|
255
+ |Avg. |28.97|
256
+ |AI2 Reasoning Challenge (25-Shot)|21.08|
257
+ |HellaSwag (10-Shot) |26.95|
258
+ |MMLU (5-Shot) |26.08|
259
+ |TruthfulQA (0-shot) |47.70|
260
+ |Winogrande (5-shot) |51.78|
261
+ |GSM8k (5-shot) | 0.23|