Hermes + Leo = Hermeo

Hermeo-7B AWQ quantized

A quantized version of the malteos German-English language model hermeo-7b merged from DPOpenHermes-7B-v2 and leo-mistral-hessianai-7b-chat using mergekit. Both base models are fine-tuned versions of Mistral-7B-v0.1.

Model details

Quantized from: hermeo-7b
Merged from: leo-mistral-hessianai-7b-chat and DPOpenHermes-7B-v2
Model type: Causal decoder-only transformer language model
Languages: English and German
License: Apache 2.0

How to use

Requires:

Transformers from commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79
AutoAWQ from commit 1c5ccc791fa2cb0697db3b4070df1813f1736208.

pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
 
pip3 install git+https://github.com/casper-hansen/AutoAWQ.git@1c5ccc791fa2cb0697db3b4070df1813f1736208

You currently can use this model using AutoAWQForCausalLM. Inference should be possible with transformers pipeline as well, but is not yet supported by AutoAWQ.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "mayflowergmbh/hermeo-7b-awq"
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)

prompt = 'Hallo, Ich bin ein Sprachmodell,'
tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()

generation_output = model.generate(
        tokens,
        do_sample=True,
        max_new_tokens=512
)

print(tokenizer.decode(generation_output[0]))

Acknowledgements

This model release is heavily inspired by Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp
Thanks to the authors of the base models: Mistral, LAION, HessianAI, Open Access AI Collective, @teknium, @bjoernp
The German evaluation datasets and scripts from @bjoernp were used.
The computing resources from DFKI's PEGASUS cluster were used for the evaluation.

Evaluation

The evaluation methdology of the Open LLM Leaderboard is followed.

German benchmarks

German tasks:	MMLU-DE	Hellaswag-DE	ARC-DE	Average
Models / Few-shots:	(5 shots)	(10 shots)	(24 shots)
7B parameters
llama-2-7b	0.400	0.513	0.381	0.431
leo-hessianai-7b	0.400	0.609	0.429	0.479
bloom-6b4-clp-german	0.274	0.550	0.351	0.392
mistral-7b	0.524	0.588	0.473	0.528
leo-mistral-hessianai-7b	0.481	0.663	0.485	0.543
leo-mistral-hessianai-7b-chat	0.458	0.617	0.465	0.513
DPOpenHermes-7B-v2	0.517	0.603	0.515	0.545
hermeo-7b (this model)	0.511	0.668	0.528	0.569
13B parameters
llama-2-13b	0.469	0.581	0.468	0.506
leo-hessianai-13b	0.486	0.658	0.509	0.551
70B parameters
llama-2-70b	0.597	0.674	0.561	0.611
leo-hessianai-70b	0.653	0.721	0.600	0.658

English benchmarks

TBA

Prompting / Prompt Template

Prompt dialogue template (ChatML format):

"""
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""

The model input can contain multiple conversation turns between user and assistant, e.g.

<|im_start|>user
{prompt 1}<|im_end|>
<|im_start|>assistant
{reply 1}<|im_end|>
<|im_start|>user
{prompt 2}<|im_end|>
<|im_start|>assistant
(...)

License

Apache 2.0