Mistral-7B-v0.2-OpenHermes

SFT Training Params:

Learning Rate: 2e-4
Batch Size: 8
Gradient Accumulation steps: 4
Dataset: teknium/OpenHermes-2.5 (200k split contains a slight bias towards rp and theory of life)
r: 16
Lora Alpha: 16

Training Time: 13 hours on A100

This model is proficient in RAG use cases

RAG Finetuning for your case would be a good idea

Prompt Template: ChatML

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.

Run easily with ollama

ollama run macadeliccc/mistral-7b-v2-openhermes

OpenAI compatible server with vLLM

install instructions for vllm can be found here

python -m vllm.entrypoints.openai.api_server \
--model macadeliccc/Mistral-7B-v0.2-OpenHermes \ 
--gpu-memory-utilization 0.9 \ # can go as low as 0.83-0.85 if you need a little more gpu for your application
--max-model-len 16000 # 32000 if you can run it. This works on 4090
--chat-template ./examples/template_chatml.jinja

Gradio chatbot interface for your endpoint

import gradio as gr
from openai import OpenAI

# Modify these variables as needed
openai_api_key = "EMPTY"  # Assuming no API key is required for local testing
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
system_message = "You are a helpful assistant"

def fast_echo(message, history):
    # Send the user's message to the vLLM API and get the response immediately
   
    chat_response = client.chat.completions.create(
        model="macadeliccc/Mistral-7B-v0.2-OpenHermes",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": message},
        ]
    )
    print(chat_response)
    return chat_response.choices[0].message.content

demo = gr.ChatInterface(fn=fast_echo, examples=["Write me a quicksort algorithm in python."]).queue()

if __name__ == "__main__":
    demo.launch()

Quantizations

Evaluations

Thanks to Maxime Labonne for the evalution:

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Mistral-7B-v0.2-OpenHermes	35.57	67.15	42.06	36.27	45.26

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	24.02	±	2.69
		acc_norm	21.65	±	2.59
agieval_logiqa_en	0	acc	28.11	±	1.76
		acc_norm	34.56	±	1.87
agieval_lsat_ar	0	acc	27.83	±	2.96
		acc_norm	23.48	±	2.80
agieval_lsat_lr	0	acc	33.73	±	2.10
		acc_norm	33.14	±	2.09
agieval_lsat_rc	0	acc	48.70	±	3.05
		acc_norm	39.78	±	2.99
agieval_sat_en	0	acc	67.48	±	3.27
		acc_norm	64.56	±	3.34
agieval_sat_en_without_passage	0	acc	38.83	±	3.40
		acc_norm	37.38	±	3.38
agieval_sat_math	0	acc	32.27	±	3.16
		acc_norm	30.00	±	3.10

Average: 35.57%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	45.05	±	1.45
		acc_norm	48.46	±	1.46
arc_easy	0	acc	77.27	±	0.86
		acc_norm	73.78	±	0.90
boolq	1	acc	68.62	±	0.81
hellaswag	0	acc	59.63	±	0.49
		acc_norm	79.66	±	0.40
openbookqa	0	acc	31.40	±	2.08
		acc_norm	43.40	±	2.22
piqa	0	acc	80.25	±	0.93
		acc_norm	82.05	±	0.90
winogrande	0	acc	74.11	±	1.23

Average: 67.15%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	27.54	±	1.56
		mc2	42.06	±	1.44

Average: 42.06%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	56.32	±	3.61
bigbench_date_understanding	0	multiple_choice_grade	66.40	±	2.46
bigbench_disambiguation_qa	0	multiple_choice_grade	45.74	±	3.11
bigbench_geometric_shapes	0	multiple_choice_grade	10.58	±	1.63
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	25.00	±	1.94
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	17.71	±	1.44
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	37.33	±	2.80
bigbench_movie_recommendation	0	multiple_choice_grade	29.40	±	2.04
bigbench_navigate	0	multiple_choice_grade	50.00	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	42.50	±	1.11
bigbench_ruin_names	0	multiple_choice_grade	39.06	±	2.31
bigbench_salient_translation_error_detection	0	multiple_choice_grade	12.93	±	1.06
bigbench_snarks	0	multiple_choice_grade	69.06	±	3.45
bigbench_sports_understanding	0	multiple_choice_grade	49.80	±	1.59
bigbench_temporal_sequences	0	multiple_choice_grade	26.50	±	1.40
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	21.20	±	1.16
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	16.06	±	0.88
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	37.33	±	2.80

Average: 36.27%

Average score: 45.26%

Elapsed time: 01:49:22

Developed by: macadeliccc
License: apache-2.0
Finetuned from model : alpindale/Mistral-7B-v0.2

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.