Hermes + Leo = Hermeo
Hermeo-7B AWQ quantized
A quantized version of the malteos German-English language model hermeo-7b merged from DPOpenHermes-7B-v2 and leo-mistral-hessianai-7b-chat using mergekit. Both base models are fine-tuned versions of Mistral-7B-v0.1.
Model details
- Quantized from: hermeo-7b
- Merged from: leo-mistral-hessianai-7b-chat and DPOpenHermes-7B-v2
- Model type: Causal decoder-only transformer language model
- Languages: English and German
- License: Apache 2.0
How to use
Requires:
- Transformers from commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79
- AutoAWQ from commit 1c5ccc791fa2cb0697db3b4070df1813f1736208.
pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
pip3 install git+https://github.com/casper-hansen/AutoAWQ.git@1c5ccc791fa2cb0697db3b4070df1813f1736208
You currently can use this model using AutoAWQForCausalLM. Inference should be possible with transformers pipeline as well, but is not yet supported by AutoAWQ.
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_name_or_path = "mayflowergmbh/hermeo-7b-awq"
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
prompt = 'Hallo, Ich bin ein Sprachmodell,'
tokens = tokenizer(
prompt,
return_tensors='pt'
).input_ids.cuda()
generation_output = model.generate(
tokens,
do_sample=True,
max_new_tokens=512
)
print(tokenizer.decode(generation_output[0]))
Acknowledgements
- This model release is heavily inspired by Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp
- Thanks to the authors of the base models: Mistral, LAION, HessianAI, Open Access AI Collective, @teknium, @bjoernp
- The German evaluation datasets and scripts from @bjoernp were used.
- The computing resources from DFKI's PEGASUS cluster were used for the evaluation.
Evaluation
The evaluation methdology of the Open LLM Leaderboard is followed.
German benchmarks
German tasks: | MMLU-DE | Hellaswag-DE | ARC-DE | Average |
---|---|---|---|---|
Models / Few-shots: | (5 shots) | (10 shots) | (24 shots) | |
7B parameters | ||||
llama-2-7b | 0.400 | 0.513 | 0.381 | 0.431 |
leo-hessianai-7b | 0.400 | 0.609 | 0.429 | 0.479 |
bloom-6b4-clp-german | 0.274 | 0.550 | 0.351 | 0.392 |
mistral-7b | 0.524 | 0.588 | 0.473 | 0.528 |
leo-mistral-hessianai-7b | 0.481 | 0.663 | 0.485 | 0.543 |
leo-mistral-hessianai-7b-chat | 0.458 | 0.617 | 0.465 | 0.513 |
DPOpenHermes-7B-v2 | 0.517 | 0.603 | 0.515 | 0.545 |
hermeo-7b (this model) | 0.511 | 0.668 | 0.528 | 0.569 |
13B parameters | ||||
llama-2-13b | 0.469 | 0.581 | 0.468 | 0.506 |
leo-hessianai-13b | 0.486 | 0.658 | 0.509 | 0.551 |
70B parameters | ||||
llama-2-70b | 0.597 | 0.674 | 0.561 | 0.611 |
leo-hessianai-70b | 0.653 | 0.721 | 0.600 | 0.658 |
English benchmarks
TBA
Prompting / Prompt Template
Prompt dialogue template (ChatML format):
"""
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""
The model input can contain multiple conversation turns between user and assistant, e.g.
<|im_start|>user
{prompt 1}<|im_end|>
<|im_start|>assistant
{reply 1}<|im_end|>
<|im_start|>user
{prompt 2}<|im_end|>
<|im_start|>assistant
(...)
License
- Downloads last month
- 30
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.