DavidGF commited on
Commit
7b369e3
1 Parent(s): 4b398dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -71
README.md CHANGED
@@ -7,8 +7,8 @@ library_name: transformers
7
  pipeline_tag: text-generation
8
  ---
9
 
10
- ![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO-multilingual")
11
- ## VAGO solutions SauerkrautLM
12
  Introducing SauerkrautLM-v1 - Your German Language Powerhouse!
13
 
14
  We are thrilled to unveil our **very first release**, **SauerkrautLM-v1**. This remarkable creation marks a significant milestone as it is specifically **tailored for the German-speaking community**. In a landscape where German language models are scarce, we are proud to offer a solution that fills this void.
@@ -62,80 +62,98 @@ Bitte erkläre mir, wie die Zusammenführung von Modellen durch bestehende Spitz
62
  ```
63
  ## Evaluation
64
  **MT-Bench (German)**
65
-
66
- ![First Turn](images/de-1turn.png "First Turn")
67
- ![Second Turn](images/de-2turn.png "Second Turn")
68
- ![Average](images/de-avg.png "Average")
69
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  **MT-Bench (English)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
- ![First Turn](images/eng-1turn.png "First Turn")
73
- ![Second Turn](images/eng-2turn.png "Second Turn")
74
- ![Average](images/eng-avg.png "Average")
75
 
76
  **Language Model evaluation Harness**
77
- ```
78
- |arc_challenge | 0|acc | 0.5555|± |0.0145|
79
- | | |acc_norm| 0.5956|± |0.0143|
80
- |arc_easy | 0|acc | 0.8388|± |0.0075|
81
- | | |acc_norm| 0.8262|± |0.0078|
82
- |boolq | 1|acc | 0.8725|± |0.0058|
83
- |copa | 0|acc | 0.9100|± |0.0288|
84
- |hellaswag | 0|acc | 0.6285|± |0.0048|
85
- | | |acc_norm| 0.8125|± |0.0039|
86
- |lambada_openai_mt_de| 0|ppl |45.7314|± |2.8280|
87
- | | |acc | 0.4141|± |0.0069|
88
- |lambada_standard | 0|ppl | 3.5467|± |0.0779|
89
- | | |acc | 0.6922|± |0.0064|
90
- |multirc | 1|acc | 0.1459|± |0.0114|
91
- |openbookqa | 0|acc | 0.3640|± |0.0215|
92
- | | |acc_norm| 0.4600|± |0.0223|
93
- |piqa | 0|acc | 0.8123|± |0.0091|
94
- | | |acc_norm| 0.8281|± |0.0088|
95
- |race | 1|acc | 0.4507|± |0.0154|
96
- |rte | 0|acc | 0.7040|± |0.0275|
97
- |truthfulqa_mc | 1|mc1 | 0.3329|± |0.0165|
98
- | | |mc2 | 0.4915|± |0.0150|
99
- |webqs | 0|acc | 0.1924|± |0.0087|
100
- |wic | 0|acc | 0.5752|± |0.0196|
101
- |winogrande | 0|acc | 0.7301|± |0.0125|
102
- |wsc | 0|acc | 0.6154|± |0.0479|
103
- |drop | 1|em | 0.2140|± |0.0042|
104
- | | |f1 | 0.4011|± |0.0041|
105
- |triviaqa | 3|em | 0.6259|± |0.0036|
106
- |wmt16-de-en | 0|bleu |39.2043|± |0.3982|
107
- | | |chrf | 0.6316|± |0.0029|
108
- | | |ter | 0.4816|± |0.0054|
109
- |wmt16-en-de | 0|bleu |25.5745|± |0.3492|
110
- | | |chrf | 0.5331|± |0.0030|
111
- | | |ter | 0.6463|± |0.0039|
112
- |xnli_de | 0|acc | 0.4547|± |0.0070|
113
- |xnli_en | 0|acc | 0.5595|± |0.0070|
114
- ```
115
  **BBH**
116
- ```
117
- | Task |Version| Metric |Value | |Stderr|
118
- |------------------------------------------------|------:|---------------------|-----:|---|-----:|
119
- |bigbench_causal_judgement | 0|multiple_choice_grade|0.6053|± |0.0356|
120
- |bigbench_date_understanding | 0|multiple_choice_grade|0.6992|± |0.0239|
121
- |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3721|± |0.0302|
122
- |bigbench_geometric_shapes | 0|multiple_choice_grade|0.1671|± |0.0197|
123
- | | |exact_str_match |0.1003|± |0.0159|
124
- |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2540|± |0.0195|
125
- |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2043|± |0.0152|
126
- |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4667|± |0.0289|
127
- |bigbench_movie_recommendation | 0|multiple_choice_grade|0.3700|± |0.0216|
128
- |bigbench_navigate | 0|multiple_choice_grade|0.4970|± |0.0158|
129
- |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.6965|± |0.0103|
130
- |bigbench_ruin_names | 0|multiple_choice_grade|0.4152|± |0.0233|
131
- |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.1443|± |0.0111|
132
- |bigbench_snarks | 0|multiple_choice_grade|0.6464|± |0.0356|
133
- |bigbench_sports_understanding | 0|multiple_choice_grade|0.6846|± |0.0148|
134
- |bigbench_temporal_sequences | 0|multiple_choice_grade|0.3150|± |0.0147|
135
- |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2168|± |0.0117|
136
- |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1537|± |0.0086|
137
- |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4667|± |0.0289|
138
- ```
139
 
140
 
141
  ## Disclaimer
 
7
  pipeline_tag: text-generation
8
  ---
9
 
10
+ ![SauerkrautLM](images/hero.png "SauerkrautLM-7b-HerO")
11
+ ## VAGO solutions SauerkrautLM-7b-HerO
12
  Introducing SauerkrautLM-v1 - Your German Language Powerhouse!
13
 
14
  We are thrilled to unveil our **very first release**, **SauerkrautLM-v1**. This remarkable creation marks a significant milestone as it is specifically **tailored for the German-speaking community**. In a landscape where German language models are scarce, we are proud to offer a solution that fills this void.
 
62
  ```
63
  ## Evaluation
64
  **MT-Bench (German)**
65
+ ```
66
+ ########## First turn ##########
67
+ score
68
+ model turn
69
+ SauerkrautLM-70b-v1 1 7.25000
70
+ SauerkrautLM-7b-HerO 1 6.96875
71
+ SauerkrautLM-7b-v1-mistral 1 6.30625
72
+ leo-hessianai-13b-chat 1 6.18750
73
+ SauerkrautLM-13b-v1 1 6.16250
74
+ leo-mistral-hessianai-7b-chat 1 6.15625
75
+ Llama-2-70b-chat-hf 1 6.03750
76
+ vicuna-13b-v1.5 1 5.80000
77
+ SauerkrautLM-7b-v1 1 5.65000
78
+ leo-hessianai-7b-chat 1 5.52500
79
+ vicuna-7b-v1.5 1 5.42500
80
+ Mistral-7B-v0.1 1 5.37500
81
+ SauerkrautLM-3b-v1 1 3.17500
82
+ Llama-2-7b 1 1.28750
83
+ open_llama_3b_v2 1 1.68750
84
+
85
+ ########## Second turn ##########
86
+ score
87
+ model turn
88
+ SauerkrautLM-70b-v1 2 6.83125
89
+ SauerkrautLM-7b-HerO 2 6.30625
90
+ vicuna-13b-v1.5 2 5.63125
91
+ SauerkrautLM-13b-v1 2 5.34375
92
+ SauerkrautLM-7b-v1-mistral 2 5.26250
93
+ leo-mistral-hessianai-7b-chat 2 4.99375
94
+ SauerkrautLM-7b-v1 2 4.73750
95
+ leo-hessianai-13b-chat 2 4.71250
96
+ vicuna-7b-v1.5 2 4.67500
97
+ Llama-2-70b-chat-hf 2 4.66250
98
+ Mistral-7B-v0.1 2 4.53750
99
+ leo-hessianai-7b-chat 2 2.65000
100
+ SauerkrautLM-3b-v1 2 1.98750
101
+ open_llama_3b_v2 2 1.22500
102
+ Llama-2-7b 2 1.07500
103
+
104
+ ########## Average ##########
105
+ score
106
+ model
107
+ SauerkrautLM-70b-v1 7.040625
108
+ SauerkrautLM-7b-HerO 6.637500
109
+ SauerkrautLM-7b-v1-mistral 5.784375
110
+ SauerkrautLM-13b-v1 5.753125
111
+ vicuna-13b-v1.5 5.715625
112
+ leo-mistral-hessianai-7b-chat 5.575000
113
+ leo-hessianai-13b-chat 5.450000
114
+ Llama-2-70b-chat-hf 5.350000
115
+ SauerkrautLM-v1-7b 5.193750
116
+ vicuna-7b-v1.5 5.050000
117
+ Mistral-7B-v0.1 4.956250
118
+ leo-hessianai-7b-chat 4.087500
119
+ SauerkrautLM-3b-v1 2.581250
120
+ open_llama_3b_v2 1.456250
121
+ Llama-2-7b 1.181250
122
+ ```
123
  **MT-Bench (English)**
124
+ ```
125
+ ########## First turn ##########
126
+ score
127
+ model turn
128
+ OpenHermes-2.5-Mistral-7B 1 8.21875
129
+ SauerkrautLM-7b-HerO 1 8.03125
130
+ Mistral-7B-OpenOrca 1 7.65625
131
+ neural-chat-7b-v3-1 1 7.22500
132
+
133
+ ########## Second turn ##########
134
+ score
135
+ model turn
136
+ OpenHermes-2.5-Mistral-7B 2 7.1000
137
+ SauerkrautLM-7b-HerO 2 6.7875
138
+ neural-chat-7b-v3-1 2 6.4000
139
+ Mistral-7B-OpenOrca 2 6.1750
140
+
141
+ ########## Average ##########
142
+ score
143
+ model
144
+ OpenHermes-2.5-Mistral-7B 7.659375
145
+ SauerkrautLM-7b-HerO 7.409375
146
+ Mistral-7B-OpenOrca 6.915625
147
+ neural-chat-7b-v3-1 6.812500
148
+ ```
149
 
 
 
 
150
 
151
  **Language Model evaluation Harness**
152
+ ![Harness](images/luminouscompare.PNG "SauerkrautLM-7b-HerO Harness")
153
+ *compared to Aleph Alpha Luminous Models
154
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  **BBH**
156
+ ![BBH](images/bbh.PNG "SauerkrautLM-7b-HerO BBH")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
 
159
  ## Disclaimer