---
language:
- de
pipeline_tag: text-generation
tags:
- awq
- autoawq
license: apache-2.0
---
# ***WIP*** 
(Please bear with me, this model will get better and get a license soon)


_Hermes + Leo + German AWQ = Germeo_

# Germeo-7B-AWQ

A German-English understanding, but German-only speaking model merged from [Hermeo-7B](https://https://huggingface.co/malteos/hermeo-7b).

### Model details

- **Merged from:** [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
- **Model type:** Causal decoder-only transformer language model
- **Languages:** German replies with English Understanding Capabilities
- **Calibration Data:** [LeoLM/OpenSchnabeltier](https://huggingface.co/datasets/LeoLM/OpenSchnabeltier)

### Quantization Procedure and Use Case:

The speciality of this model is that it solely replies in German, independently from the system message or prompt.
Within the AWQ-process I introduced OpenSchnabeltier as calibration data for the model to stress the importance of German Tokens.


### Usage

Setup in autoawq
```python
# setup [autoawq](https://github.com/casper-hansen/AutoAWQ)
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

quant_path = "aari1995/germeo-7b-awq"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```

Setup in transformers (works in colab)
```python
# pip install [autoawq](https://github.com/casper-hansen/AutoAWQ) and pip install --upgrade transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

quant_path = "aari1995/germeo-7b-awq"

# Load model
model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
```

### Inference:
```python
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = """<|im_start|>system
Du bist ein hilfreicher Assistent.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""

prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"

tokens = tokenizer(
    prompt_template.format(prompt=prompt), 
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=1012
)
# tokenizer.decode(generation_output.flatten())
```

### FAQ
#### The model continues after the reply with user inputs:
  To solve this, you need to implement a custom stopping criteria:

```python
from transformers import StoppingCriteria
class GermeoStoppingCriteria(StoppingCriteria):
  def __init__(self, target_sequence, prompt):
      self.target_sequence = target_sequence
      self.prompt=prompt

  def __call__(self, input_ids, scores, **kwargs):
      # Get the generated text as a string
      generated_text = tokenizer.decode(input_ids[0])
      generated_text = generated_text.replace(self.prompt,'')
      # Check if the target sequence appears in the generated text
      if self.target_sequence in generated_text:
          return True  # Stop generation

      return False  # Continue generation

  def __len__(self):
      return 1

  def __iter__(self):
      yield self
```
This then expects your input prompt (formatted as given into the model), and a stopping criteria, in this case the im_end token. Simply add it to the generation:

```python
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=1012,
    stopping_criteria=GermeoStoppingCriteria("<|im_end|>", prompt_template.format(prompt=prompt))
)
```
### Acknowledgements and Special Thanks

- Thank you [malteos](https://https://huggingface.co/malteos/)  for hermeo, without this it would not be possible! (and all your other contributions)
- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
- Also [@bjoernp](https://huggingface.co/bjoernp) thank you for your contribution and LeoLM for OpenSchnabeltier.

## Evaluation and Benchmarks (German only)


### German benchmarks

| **German tasks:**             | **MMLU-DE**    | **Hellaswag-DE** | **ARC-DE**      |**Average**      |
|-------------------------------|-------------|---------------|--------------|--------------|
| **Models / Few-shots:**       | _(5 shots)_ | _(10 shots)_  | _(24 shots)_ | |
| _7B parameters_      |  | |  | |
| llama-2-7b                    | 0.400       | 0.513         | 0.381        | 0.431  |
| leo-hessianai-7b              | 0.400       | 0.609         | 0.429        | 0.479 |
| bloom-6b4-clp-german          | 0.274       | 0.550         | 0.351        | 0.392 |
| mistral-7b                    | **0.524**       | 0.588         | 0.473        | 0.528 |
| leo-mistral-hessianai-7b      | 0.481       | 0.663         | 0.485        | 0.543 |
| leo-mistral-hessianai-7b-chat | 0.458       | 0.617         | 0.465        | 0.513 |
| DPOpenHermes-7B-v2            | 0.517         | 0.603         | 0.515        | 0.545 |
| hermeo-7b                     | 0.511       | **0.668**         | **0.528**        | **0.569** |
| **germeo-7b-awq (this model)**| 0.522       | 0.651         | 0.514        | 0.563 |
| _13B parameters_      |  | |  | |
| llama-2-13b                    | 0.469       | 0.581        | 0.468        | 0.506 |
| leo-hessianai-13b              | **0.486**       | **0.658**         | **0.509**       | **0.551** |
| _70B parameters_      |  | |  | |
| llama-2-70b                    | 0.597       | 0.674       | 0.561       | 0.611 |
| leo-hessianai-70b              | **0.653**       | **0.721**         | **0.600**       | **0.658** |


### German reply rate benchmark
The fraction of German reply rates according to [this benchmark](https://huggingface.co/spaces/floleuerer/german_llm_outputs)

| **Models:**             | **German Response Rate**    |
|-------------------------|-------------------------|
| hermeo-7b                     | tba      |
| **germeo-7b-awq (this model)**| tba       |

### Additional Benchmarks:

TruthfulQA-DE: 0.508