DavidGF commited on
Commit
8d64e83
1 Parent(s): f0ef275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -9
README.md CHANGED
@@ -32,14 +32,14 @@ Our approach ensures that the model retains its original strengths while acquiri
32
  - [Training Dataset](#training-dataset)
33
  - [Merge Procedure](#merge-procedure)
34
  3. [Evaluation](#evaluation)
35
- - [MT-Bench (German)](#mt-bench-german)
36
- - [MT-Bench (English)](#mt-bench-english)
37
  - [Language Model evaluation Harness](#language-model-evaluation-harness)
38
  - [BigBench](#BBH)
39
- - [GPT4ALL](#gpt4all)
 
40
  - [Additional German Benchmark results](#additional-german-benchmark-results)
41
- 4. [Disclaimer](#disclaimer)
42
- 5. [Contact](#contact)
43
  7. [Collaborations](#collaborations)
44
  8. [Acknowledgement](#acknowledgement)
45
 
@@ -174,7 +174,11 @@ SauerkrautLM-7b-HerO <--- 7.409375
174
  Mistral-7B-OpenOrca 6.915625
175
  neural-chat-7b-v3-1 6.812500
176
  ```
 
 
 
177
 
 
178
 
179
  ### Language Model evaluation Harness:
180
  Compared to Aleph Alpha Luminous Models
@@ -184,11 +188,95 @@ Compared to Aleph Alpha Luminous Models
184
  ### BBH:
185
  ![BBH](https://vago-solutions.de/wp-content/uploads/2023/11/bbh.png "SauerkrautLM-7b-HerO BBH")
186
  *performed with newest Language Model Evaluation Harness
187
- ### GPT4ALL:
188
- Compared to Aleph Alpha Luminous Models, LeoLM and EM_German
189
- ![GPT4ALL diagram](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All.png "SauerkrautLM-7b-HerO GPT4ALL Diagram")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
- ![GPT4ALL table](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All-Tabelle.png "SauerkrautLM-7b-HerO GPT4ALL Table")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
  ### Additional German Benchmark results:
193
  ![GermanBenchmarks](https://vago-solutions.de/wp-content/uploads/2023/11/German-benchmarks.png "SauerkrautLM-7b-HerO German Benchmarks")
194
  *performed with newest Language Model Evaluation Harness
 
32
  - [Training Dataset](#training-dataset)
33
  - [Merge Procedure](#merge-procedure)
34
  3. [Evaluation](#evaluation)
35
+ - [GPT4ALL](#gpt4all)
 
36
  - [Language Model evaluation Harness](#language-model-evaluation-harness)
37
  - [BigBench](#BBH)
38
+ - [MT-Bench (German)](#mt-bench-german)
39
+ - [MT-Bench (English)](#mt-bench-english)
40
  - [Additional German Benchmark results](#additional-german-benchmark-results)
41
+ 5. [Disclaimer](#disclaimer)
42
+ 6. [Contact](#contact)
43
  7. [Collaborations](#collaborations)
44
  8. [Acknowledgement](#acknowledgement)
45
 
 
174
  Mistral-7B-OpenOrca 6.915625
175
  neural-chat-7b-v3-1 6.812500
176
  ```
177
+ ### GPT4ALL:
178
+ Compared to Aleph Alpha Luminous Models, LeoLM and EM_German
179
+ ![GPT4ALL diagram](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All.png "SauerkrautLM-7b-HerO GPT4ALL Diagram")
180
 
181
+ ![GPT4ALL table](https://vago-solutions.de/wp-content/uploads/2023/11/GPT4All-Tabelle.png "SauerkrautLM-7b-HerO GPT4ALL Table")
182
 
183
  ### Language Model evaluation Harness:
184
  Compared to Aleph Alpha Luminous Models
 
188
  ### BBH:
189
  ![BBH](https://vago-solutions.de/wp-content/uploads/2023/11/bbh.png "SauerkrautLM-7b-HerO BBH")
190
  *performed with newest Language Model Evaluation Harness
191
+ ### MT-Bench (German):
192
+ ![MT-Bench German Diagram](https://vago-solutions.de/wp-content/uploads/2023/11/MT-Bench-German.png "SauerkrautLM-7b-HerO MT-Bench German Diagram")
193
+ ```
194
+ ########## First turn ##########
195
+ score
196
+ model turn
197
+ SauerkrautLM-70b-v1 1 7.25000
198
+ SauerkrautLM-7b-HerO <--- 1 6.96875
199
+ SauerkrautLM-7b-v1-mistral 1 6.30625
200
+ leo-hessianai-13b-chat 1 6.18750
201
+ SauerkrautLM-13b-v1 1 6.16250
202
+ leo-mistral-hessianai-7b-chat 1 6.15625
203
+ Llama-2-70b-chat-hf 1 6.03750
204
+ vicuna-13b-v1.5 1 5.80000
205
+ SauerkrautLM-7b-v1 1 5.65000
206
+ leo-hessianai-7b-chat 1 5.52500
207
+ vicuna-7b-v1.5 1 5.42500
208
+ Mistral-7B-v0.1 1 5.37500
209
+ SauerkrautLM-3b-v1 1 3.17500
210
+ Llama-2-7b 1 1.28750
211
+ open_llama_3b_v2 1 1.68750
212
 
213
+ ########## Second turn ##########
214
+ score
215
+ model turn
216
+ SauerkrautLM-70b-v1 2 6.83125
217
+ SauerkrautLM-7b-HerO <--- 2 6.30625
218
+ vicuna-13b-v1.5 2 5.63125
219
+ SauerkrautLM-13b-v1 2 5.34375
220
+ SauerkrautLM-7b-v1-mistral 2 5.26250
221
+ leo-mistral-hessianai-7b-chat 2 4.99375
222
+ SauerkrautLM-7b-v1 2 4.73750
223
+ leo-hessianai-13b-chat 2 4.71250
224
+ vicuna-7b-v1.5 2 4.67500
225
+ Llama-2-70b-chat-hf 2 4.66250
226
+ Mistral-7B-v0.1 2 4.53750
227
+ leo-hessianai-7b-chat 2 2.65000
228
+ SauerkrautLM-3b-v1 2 1.98750
229
+ open_llama_3b_v2 2 1.22500
230
+ Llama-2-7b 2 1.07500
231
+
232
+ ########## Average ##########
233
+ score
234
+ model
235
+ SauerkrautLM-70b-v1 7.040625
236
+ SauerkrautLM-7b-HerO <--- 6.637500
237
+ SauerkrautLM-7b-v1-mistral 5.784375
238
+ SauerkrautLM-13b-v1 5.753125
239
+ vicuna-13b-v1.5 5.715625
240
+ leo-mistral-hessianai-7b-chat 5.575000
241
+ leo-hessianai-13b-chat 5.450000
242
+ Llama-2-70b-chat-hf 5.350000
243
+ SauerkrautLM-v1-7b 5.193750
244
+ vicuna-7b-v1.5 5.050000
245
+ Mistral-7B-v0.1 4.956250
246
+ leo-hessianai-7b-chat 4.087500
247
+ SauerkrautLM-3b-v1 2.581250
248
+ open_llama_3b_v2 1.456250
249
+ Llama-2-7b 1.181250
250
+ ```
251
+
252
+
253
+ ### MT-Bench (English):
254
+ ![MT-Bench English Diagram](https://vago-solutions.de/wp-content/uploads/2023/11/MT-Bench-Englisch.png "SauerkrautLM-7b-HerO MT-Bench English Diagram")
255
+ ```
256
+ ########## First turn ##########
257
+ score
258
+ model turn
259
+ OpenHermes-2.5-Mistral-7B 1 8.21875
260
+ SauerkrautLM-7b-HerO <--- 1 8.03125
261
+ Mistral-7B-OpenOrca 1 7.65625
262
+ neural-chat-7b-v3-1 1 7.22500
263
+
264
+ ########## Second turn ##########
265
+ score
266
+ model turn
267
+ OpenHermes-2.5-Mistral-7B 2 7.1000
268
+ SauerkrautLM-7b-HerO <--- 2 6.7875
269
+ neural-chat-7b-v3-1 2 6.4000
270
+ Mistral-7B-OpenOrca 2 6.1750
271
+
272
+ ########## Average ##########
273
+ score
274
+ model
275
+ OpenHermes-2.5-Mistral-7B 7.659375
276
+ SauerkrautLM-7b-HerO <--- 7.409375
277
+ Mistral-7B-OpenOrca 6.915625
278
+ neural-chat-7b-v3-1 6.812500
279
+ ```
280
  ### Additional German Benchmark results:
281
  ![GermanBenchmarks](https://vago-solutions.de/wp-content/uploads/2023/11/German-benchmarks.png "SauerkrautLM-7b-HerO German Benchmarks")
282
  *performed with newest Language Model Evaluation Harness