Edit model card

image/png

Hermes + Leo = Hermeo

Hermeo-7B AWQ quantized

A quantized version of the malteos German-English language model hermeo-7b merged from DPOpenHermes-7B-v2 and leo-mistral-hessianai-7b-chat using mergekit. Both base models are fine-tuned versions of Mistral-7B-v0.1.

Model details

How to use

Requires:

pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
 
pip3 install git+https://github.com/casper-hansen/AutoAWQ.git@1c5ccc791fa2cb0697db3b4070df1813f1736208

You currently can use this model using AutoAWQForCausalLM. Inference should be possible with transformers pipeline as well, but is not yet supported by AutoAWQ.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "mayflowergmbh/hermeo-7b-awq"
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)

prompt = 'Hallo, Ich bin ein Sprachmodell,'
tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

generation_output = model.generate(
        tokens,
        do_sample=True,
        max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

Acknowledgements

Evaluation

The evaluation methdology of the Open LLM Leaderboard is followed.

German benchmarks

German tasks: MMLU-DE Hellaswag-DE ARC-DE Average
Models / Few-shots: (5 shots) (10 shots) (24 shots)
7B parameters
llama-2-7b 0.400 0.513 0.381 0.431
leo-hessianai-7b 0.400 0.609 0.429 0.479
bloom-6b4-clp-german 0.274 0.550 0.351 0.392
mistral-7b 0.524 0.588 0.473 0.528
leo-mistral-hessianai-7b 0.481 0.663 0.485 0.543
leo-mistral-hessianai-7b-chat 0.458 0.617 0.465 0.513
DPOpenHermes-7B-v2 0.517 0.603 0.515 0.545
hermeo-7b (this model) 0.511 0.668 0.528 0.569
13B parameters
llama-2-13b 0.469 0.581 0.468 0.506
leo-hessianai-13b 0.486 0.658 0.509 0.551
70B parameters
llama-2-70b 0.597 0.674 0.561 0.611
leo-hessianai-70b 0.653 0.721 0.600 0.658

English benchmarks

TBA

Prompting / Prompt Template

Prompt dialogue template (ChatML format):

"""
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""

The model input can contain multiple conversation turns between user and assistant, e.g.

<|im_start|>user
{prompt 1}<|im_end|>
<|im_start|>assistant
{reply 1}<|im_end|>
<|im_start|>user
{prompt 2}<|im_end|>
<|im_start|>assistant
(...)

License

Apache 2.0

Downloads last month
4
Safetensors
Model size
1.2B params
Tensor type
I32
·
FP16
·