Commit
•
3081eaa
1
Parent(s):
ce14505
uploaded readme
Browse files
README.md
ADDED
@@ -0,0 +1,294 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Quantization made by Richard Erkhov.
|
2 |
+
|
3 |
+
[Github](https://github.com/RichardErkhov)
|
4 |
+
|
5 |
+
[Discord](https://discord.gg/pvy7H8DZMG)
|
6 |
+
|
7 |
+
[Request more models](https://github.com/RichardErkhov/quant_request)
|
8 |
+
|
9 |
+
|
10 |
+
germeo-7b-laser - GGUF
|
11 |
+
- Model creator: https://huggingface.co/aari1995/
|
12 |
+
- Original model: https://huggingface.co/aari1995/germeo-7b-laser/
|
13 |
+
|
14 |
+
|
15 |
+
| Name | Quant method | Size |
|
16 |
+
| ---- | ---- | ---- |
|
17 |
+
| [germeo-7b-laser.Q2_K.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q2_K.gguf) | Q2_K | 2.53GB |
|
18 |
+
| [germeo-7b-laser.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.IQ3_XS.gguf) | IQ3_XS | 2.81GB |
|
19 |
+
| [germeo-7b-laser.IQ3_S.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.IQ3_S.gguf) | IQ3_S | 2.96GB |
|
20 |
+
| [germeo-7b-laser.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q3_K_S.gguf) | Q3_K_S | 2.95GB |
|
21 |
+
| [germeo-7b-laser.IQ3_M.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.IQ3_M.gguf) | IQ3_M | 3.06GB |
|
22 |
+
| [germeo-7b-laser.Q3_K.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q3_K.gguf) | Q3_K | 3.28GB |
|
23 |
+
| [germeo-7b-laser.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q3_K_M.gguf) | Q3_K_M | 3.28GB |
|
24 |
+
| [germeo-7b-laser.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q3_K_L.gguf) | Q3_K_L | 3.56GB |
|
25 |
+
| [germeo-7b-laser.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.IQ4_XS.gguf) | IQ4_XS | 3.67GB |
|
26 |
+
| [germeo-7b-laser.Q4_0.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q4_0.gguf) | Q4_0 | 3.83GB |
|
27 |
+
| [germeo-7b-laser.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.IQ4_NL.gguf) | IQ4_NL | 3.87GB |
|
28 |
+
| [germeo-7b-laser.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q4_K_S.gguf) | Q4_K_S | 3.86GB |
|
29 |
+
| [germeo-7b-laser.Q4_K.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q4_K.gguf) | Q4_K | 4.07GB |
|
30 |
+
| [germeo-7b-laser.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q4_K_M.gguf) | Q4_K_M | 4.07GB |
|
31 |
+
| [germeo-7b-laser.Q4_1.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q4_1.gguf) | Q4_1 | 4.24GB |
|
32 |
+
| [germeo-7b-laser.Q5_0.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q5_0.gguf) | Q5_0 | 4.65GB |
|
33 |
+
| [germeo-7b-laser.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q5_K_S.gguf) | Q5_K_S | 4.65GB |
|
34 |
+
| [germeo-7b-laser.Q5_K.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q5_K.gguf) | Q5_K | 4.78GB |
|
35 |
+
| [germeo-7b-laser.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q5_K_M.gguf) | Q5_K_M | 4.78GB |
|
36 |
+
| [germeo-7b-laser.Q5_1.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q5_1.gguf) | Q5_1 | 5.07GB |
|
37 |
+
| [germeo-7b-laser.Q6_K.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q6_K.gguf) | Q6_K | 5.53GB |
|
38 |
+
| [germeo-7b-laser.Q8_0.gguf](https://huggingface.co/RichardErkhov/aari1995_-_germeo-7b-laser-gguf/blob/main/germeo-7b-laser.Q8_0.gguf) | Q8_0 | 7.17GB |
|
39 |
+
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
Original model description:
|
44 |
+
---
|
45 |
+
language:
|
46 |
+
- de
|
47 |
+
license: apache-2.0
|
48 |
+
tags:
|
49 |
+
- hermeo
|
50 |
+
- laser
|
51 |
+
datasets:
|
52 |
+
- LeoLM/OpenSchnabeltier
|
53 |
+
pipeline_tag: conversational
|
54 |
+
model-index:
|
55 |
+
- name: germeo-7b-laser
|
56 |
+
results:
|
57 |
+
- task:
|
58 |
+
type: text-generation
|
59 |
+
name: Text Generation
|
60 |
+
dataset:
|
61 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
62 |
+
type: ai2_arc
|
63 |
+
config: ARC-Challenge
|
64 |
+
split: test
|
65 |
+
args:
|
66 |
+
num_few_shot: 25
|
67 |
+
metrics:
|
68 |
+
- type: acc_norm
|
69 |
+
value: 60.75
|
70 |
+
name: normalized accuracy
|
71 |
+
source:
|
72 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
73 |
+
name: Open LLM Leaderboard
|
74 |
+
- task:
|
75 |
+
type: text-generation
|
76 |
+
name: Text Generation
|
77 |
+
dataset:
|
78 |
+
name: HellaSwag (10-Shot)
|
79 |
+
type: hellaswag
|
80 |
+
split: validation
|
81 |
+
args:
|
82 |
+
num_few_shot: 10
|
83 |
+
metrics:
|
84 |
+
- type: acc_norm
|
85 |
+
value: 82.81
|
86 |
+
name: normalized accuracy
|
87 |
+
source:
|
88 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
89 |
+
name: Open LLM Leaderboard
|
90 |
+
- task:
|
91 |
+
type: text-generation
|
92 |
+
name: Text Generation
|
93 |
+
dataset:
|
94 |
+
name: MMLU (5-Shot)
|
95 |
+
type: cais/mmlu
|
96 |
+
config: all
|
97 |
+
split: test
|
98 |
+
args:
|
99 |
+
num_few_shot: 5
|
100 |
+
metrics:
|
101 |
+
- type: acc
|
102 |
+
value: 60.57
|
103 |
+
name: accuracy
|
104 |
+
source:
|
105 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
106 |
+
name: Open LLM Leaderboard
|
107 |
+
- task:
|
108 |
+
type: text-generation
|
109 |
+
name: Text Generation
|
110 |
+
dataset:
|
111 |
+
name: TruthfulQA (0-shot)
|
112 |
+
type: truthful_qa
|
113 |
+
config: multiple_choice
|
114 |
+
split: validation
|
115 |
+
args:
|
116 |
+
num_few_shot: 0
|
117 |
+
metrics:
|
118 |
+
- type: mc2
|
119 |
+
value: 53.83
|
120 |
+
source:
|
121 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
122 |
+
name: Open LLM Leaderboard
|
123 |
+
- task:
|
124 |
+
type: text-generation
|
125 |
+
name: Text Generation
|
126 |
+
dataset:
|
127 |
+
name: Winogrande (5-shot)
|
128 |
+
type: winogrande
|
129 |
+
config: winogrande_xl
|
130 |
+
split: validation
|
131 |
+
args:
|
132 |
+
num_few_shot: 5
|
133 |
+
metrics:
|
134 |
+
- type: acc
|
135 |
+
value: 75.61
|
136 |
+
name: accuracy
|
137 |
+
source:
|
138 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
139 |
+
name: Open LLM Leaderboard
|
140 |
+
- task:
|
141 |
+
type: text-generation
|
142 |
+
name: Text Generation
|
143 |
+
dataset:
|
144 |
+
name: GSM8k (5-shot)
|
145 |
+
type: gsm8k
|
146 |
+
config: main
|
147 |
+
split: test
|
148 |
+
args:
|
149 |
+
num_few_shot: 5
|
150 |
+
metrics:
|
151 |
+
- type: acc
|
152 |
+
value: 43.37
|
153 |
+
name: accuracy
|
154 |
+
source:
|
155 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=aari1995/germeo-7b-laser
|
156 |
+
name: Open LLM Leaderboard
|
157 |
+
---
|
158 |
+
|
159 |
+
(Evaluation WIP)
|
160 |
+
|
161 |
+
## Hermes + Leo + German Laser = Germeo
|
162 |
+
|
163 |
+
## Germeo-7B-Laser
|
164 |
+
A German-English understanding, but German-only speaking model merged from Hermeo-7B.
|
165 |
+
|
166 |
+
### Model details
|
167 |
+
|
168 |
+
**Merged from**: leo-mistral-hessianai-7b-chat and DPOpenHermes-7B-v2
|
169 |
+
|
170 |
+
**Model type**: Causal decoder-only transformer language model
|
171 |
+
|
172 |
+
**Languages**: German replies with English Understanding Capabilities
|
173 |
+
|
174 |
+
**Laser-Data**: LeoLM/OpenSchnabeltier
|
175 |
+
|
176 |
+
|
177 |
+
This is an early experiment on laser and its influence on language understanding. It generally improves the language understanding capabilities.
|
178 |
+
The hypothesis is that it degrades the probability of English replies and increasing those of German replies. The models internal German capabilities are boosted.
|
179 |
+
|
180 |
+
Will keep you updated..
|
181 |
+
|
182 |
+
### Acknowledgements:
|
183 |
+
|
184 |
+
I would like to thank everyone that participated in making this model and its training possible:
|
185 |
+
To [@malteos](https://huggingface.co/malteos) for hermeo
|
186 |
+
To [@cognitivecomputations](https://huggingface.co/cognitivecomputations) and Fernando Fernandes Neto for their implementation of LASER
|
187 |
+
To [@LeoLM](https://huggingface.co/LeoLM) and Björn for the OpenSchnabeltier dataset.
|
188 |
+
|
189 |
+
|
190 |
+
### Prompt format:
|
191 |
+
|
192 |
+
```python
|
193 |
+
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
|
194 |
+
# Convert prompt to tokens
|
195 |
+
prompt_template = """<|im_start|>system
|
196 |
+
Du bist ein hilfreicher Assistent.<|im_end|>
|
197 |
+
<|im_start|>user
|
198 |
+
{prompt}<|im_end|>
|
199 |
+
<|im_start|>assistant"""
|
200 |
+
|
201 |
+
prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"
|
202 |
+
|
203 |
+
final_prompt = prompt_template.format(prompt=prompt)
|
204 |
+
```
|
205 |
+
|
206 |
+
#### Limit the model to output reply-only:
|
207 |
+
To solve this, you need to implement a custom stopping criteria:
|
208 |
+
|
209 |
+
```python
|
210 |
+
from transformers import StoppingCriteria
|
211 |
+
class GermeoStoppingCriteria(StoppingCriteria):
|
212 |
+
def __init__(self, target_sequence, prompt):
|
213 |
+
self.target_sequence = target_sequence
|
214 |
+
self.prompt=prompt
|
215 |
+
|
216 |
+
def __call__(self, input_ids, scores, **kwargs):
|
217 |
+
# Get the generated text as a string
|
218 |
+
generated_text = tokenizer.decode(input_ids[0])
|
219 |
+
generated_text = generated_text.replace(self.prompt,'')
|
220 |
+
# Check if the target sequence appears in the generated text
|
221 |
+
if self.target_sequence in generated_text:
|
222 |
+
return True # Stop generation
|
223 |
+
|
224 |
+
return False # Continue generation
|
225 |
+
|
226 |
+
def __len__(self):
|
227 |
+
return 1
|
228 |
+
|
229 |
+
def __iter__(self):
|
230 |
+
yield self
|
231 |
+
```
|
232 |
+
This then expects your input prompt (formatted as given into the model), and a stopping criteria, in this case the im_end token. Simply add it to the generation:
|
233 |
+
|
234 |
+
```python
|
235 |
+
generation_output = model.generate(
|
236 |
+
tokens,
|
237 |
+
streamer=streamer,
|
238 |
+
max_new_tokens=1012,
|
239 |
+
stopping_criteria=GermeoStoppingCriteria("<|im_end|>", prompt_template.format(prompt=prompt))
|
240 |
+
)
|
241 |
+
```
|
242 |
+
|
243 |
+
### German benchmarks
|
244 |
+
|
245 |
+
| **German tasks:** | **MMLU-DE** | **Hellaswag-DE** | **ARC-DE** |**Average** |
|
246 |
+
|-------------------------------|-------------|---------------|--------------|--------------|
|
247 |
+
| **Models / Few-shots:** | _(5 shots)_ | _(10 shots)_ | _(24 shots)_ | |
|
248 |
+
| _7B parameters_ | | | | |
|
249 |
+
| llama-2-7b | 0.400 | 0.513 | 0.381 | 0.431 |
|
250 |
+
| leo-hessianai-7b | 0.400 | 0.609 | 0.429 | 0.479 |
|
251 |
+
| bloom-6b4-clp-german | 0.274 | 0.550 | 0.351 | 0.392 |
|
252 |
+
| mistral-7b | **0.524** | 0.588 | 0.473 | 0.528 |
|
253 |
+
| leo-mistral-hessianai-7b | 0.481 | 0.663 | 0.485 | 0.543 |
|
254 |
+
| leo-mistral-hessianai-7b-chat | 0.458 | 0.617 | 0.465 | 0.513 |
|
255 |
+
| DPOpenHermes-7B-v2 | 0.517 | 0.603 | 0.515 | 0.545 |
|
256 |
+
| hermeo-7b | 0.511 | **0.668** | **0.528** | **0.569** |
|
257 |
+
| **germeo-7b-laser (this model)**| ? | ? | ? | ? |
|
258 |
+
| _13B parameters_ | | | | |
|
259 |
+
| llama-2-13b | 0.469 | 0.581 | 0.468 | 0.506 |
|
260 |
+
| leo-hessianai-13b | **0.486** | **0.658** | **0.509** | **0.551** |
|
261 |
+
| _70B parameters_ | | | | |
|
262 |
+
| llama-2-70b | 0.597 | 0.674 | 0.561 | 0.611 |
|
263 |
+
| leo-hessianai-70b | **0.653** | **0.721** | **0.600** | **0.658** |
|
264 |
+
|
265 |
+
|
266 |
+
Even though the model does not generate English text without being explicitly asked, performance on English Benchmarks is still up:
|
267 |
+
|
268 |
+
### English benchmarks
|
269 |
+
|
270 |
+
| **English tasks:** | **MMLU** | **Hellaswag** | **ARC** | **Average** |
|
271 |
+
|------------------------------------|-------------|---------------|--------------|-------------|
|
272 |
+
| **Models / Few-shots:** | _(5 shots)_ | _(10 shots)_ | _(24 shots)_ | |
|
273 |
+
| llama-2-7b | 0.466 | 0.786 | 0.530 | 0.594 |
|
274 |
+
| leolm-hessianai-7b | 0.423 | 0.759 | 0.522 | 0.568 |
|
275 |
+
| bloom-6b4-clp-german | 0.264 | 0.525 | 0.328 | 0.372 |
|
276 |
+
| mistral-7b | **0.635** | **0.832** | 0.607 | **0.691** |
|
277 |
+
| leolm-mistral-hessianai-7b | 0.550 | 0.777 | 0.518 | 0.615 |
|
278 |
+
| hermeo-7b | 0.601 | 0.821 | **0.620** | 0.681 |
|
279 |
+
| germeo-7b-laser (this model) | 0.601 | 0.828 | 0.608 | 0.679 |
|
280 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
281 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_aari1995__germeo-7b-laser)
|
282 |
+
|
283 |
+
| Metric |Value|
|
284 |
+
|---------------------------------|----:|
|
285 |
+
|Avg. |62.82|
|
286 |
+
|AI2 Reasoning Challenge (25-Shot)|60.75|
|
287 |
+
|HellaSwag (10-Shot) |82.81|
|
288 |
+
|MMLU (5-Shot) |60.57|
|
289 |
+
|TruthfulQA (0-shot) |53.83|
|
290 |
+
|Winogrande (5-shot) |75.61|
|
291 |
+
|GSM8k (5-shot) |43.37|
|
292 |
+
|
293 |
+
|
294 |
+
|