Quantization made by Richard Erkhov.

MBX-7B-v3-DPO - GGUF

Model creator: https://huggingface.co/macadeliccc/
Original model: https://huggingface.co/macadeliccc/MBX-7B-v3-DPO/

Name	Quant method	Size
MBX-7B-v3-DPO.Q2_K.gguf	Q2_K	2.53GB
MBX-7B-v3-DPO.IQ3_XS.gguf	IQ3_XS	2.81GB
MBX-7B-v3-DPO.IQ3_S.gguf	IQ3_S	2.96GB
MBX-7B-v3-DPO.Q3_K_S.gguf	Q3_K_S	2.95GB
MBX-7B-v3-DPO.IQ3_M.gguf	IQ3_M	3.06GB
MBX-7B-v3-DPO.Q3_K.gguf	Q3_K	3.28GB
MBX-7B-v3-DPO.Q3_K_M.gguf	Q3_K_M	3.28GB
MBX-7B-v3-DPO.Q3_K_L.gguf	Q3_K_L	3.56GB
MBX-7B-v3-DPO.IQ4_XS.gguf	IQ4_XS	3.67GB
MBX-7B-v3-DPO.Q4_0.gguf	Q4_0	3.83GB
MBX-7B-v3-DPO.IQ4_NL.gguf	IQ4_NL	3.87GB
MBX-7B-v3-DPO.Q4_K_S.gguf	Q4_K_S	3.86GB
MBX-7B-v3-DPO.Q4_K.gguf	Q4_K	4.07GB
MBX-7B-v3-DPO.Q4_K_M.gguf	Q4_K_M	4.07GB
MBX-7B-v3-DPO.Q4_1.gguf	Q4_1	4.24GB
MBX-7B-v3-DPO.Q5_0.gguf	Q5_0	4.65GB
MBX-7B-v3-DPO.Q5_K_S.gguf	Q5_K_S	4.65GB
MBX-7B-v3-DPO.Q5_K.gguf	Q5_K	4.78GB
MBX-7B-v3-DPO.Q5_K_M.gguf	Q5_K_M	4.78GB
MBX-7B-v3-DPO.Q5_1.gguf	Q5_1	5.07GB
MBX-7B-v3-DPO.Q6_K.gguf	Q6_K	5.53GB
MBX-7B-v3-DPO.Q8_0.gguf	Q8_0	7.17GB

Original model description:

license: cc library_name: transformers datasets: - jondurbin/truthy-dpo-v0.1 model-index: - name: MBX-7B-v3-DPO results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 73.55 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 89.11 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 64.91 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 74.0 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 85.56 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 69.67 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/MBX-7B-v3-DPO name: Open LLM Leaderboard

MBX-7B-v3-DPO

This model is a finetune of flemmingmiguel/MBX-7B-v3 using jondurbin/truthy-dpo-v0.1

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("macadeliccc/MBX-7B-v3-DPO")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/MBX-7B-v3-DPO")

messages = [
    {"role": "system", "content": "Respond to the users request like a pirate"},
    {"role": "user", "content": "Can you write me a quicksort algorithm?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")

Example Output

GGUF

Available here

Exllamav2

Quants are available from bartowski, check them out here

Download the size you want below, VRAM figures are estimates.

Branch	Bits	lm_head bits	VRAM (4k)	VRAM (16k)	VRAM (32k)	Description
8_0	8.0	8.0	8.4 GB	9.8 GB	11.8 GB	Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5	6.5	8.0	7.2 GB	8.6 GB	10.6 GB	Very similar to 8.0, good tradeoff of size vs performance, recommended.
5_0	5.0	6.0	6.0 GB	7.4 GB	9.4 GB	Slightly lower quality vs 6.5, but usable on 8GB cards.
4_25	4.25	6.0	5.3 GB	6.7 GB	8.7 GB	GPTQ equivalent bits per weight, slightly higher quality.
3_5	3.5	6.0	4.7 GB	6.1 GB	8.1 GB	Lower quality, only use if you have to.

Evaluations

EQ-Bench Comparison

----Benchmark Complete----
2024-01-30 15:22:18
Time taken: 145.9 mins
Prompt Format: ChatML
Model: macadeliccc/MBX-7B-v3-DPO
Score (v2): 74.32
Parseable: 166.0
---------------
Batch completed
Time taken: 145.9 mins
---------------

Original Model

----Benchmark Complete----
2024-01-31 01:26:26
Time taken: 89.1 mins
Prompt Format: Mistral
Model: flemmingmiguel/MBX-7B-v3
Score (v2): 73.87
Parseable: 168.0
---------------
Batch completed
Time taken: 89.1 mins
---------------

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
MBX-7B-v3-DPO	45.16	77.73	74.62	48.83	61.58

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	27.95	±	2.82
		acc_norm	26.77	±	2.78
agieval_logiqa_en	0	acc	41.01	±	1.93
		acc_norm	40.55	±	1.93
agieval_lsat_ar	0	acc	25.65	±	2.89
		acc_norm	23.91	±	2.82
agieval_lsat_lr	0	acc	50.78	±	2.22
		acc_norm	52.94	±	2.21
agieval_lsat_rc	0	acc	66.54	±	2.88
		acc_norm	65.80	±	2.90
agieval_sat_en	0	acc	77.67	±	2.91
		acc_norm	77.67	±	2.91
agieval_sat_en_without_passage	0	acc	43.20	±	3.46
		acc_norm	43.20	±	3.46
agieval_sat_math	0	acc	32.27	±	3.16
		acc_norm	30.45	±	3.11

Average: 45.16%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	68.43	±	1.36
		acc_norm	68.34	±	1.36
arc_easy	0	acc	87.54	±	0.68
		acc_norm	82.11	±	0.79
boolq	1	acc	88.20	±	0.56
hellaswag	0	acc	69.76	±	0.46
		acc_norm	87.40	±	0.33
openbookqa	0	acc	40.20	±	2.19
		acc_norm	49.60	±	2.24
piqa	0	acc	83.68	±	0.86
		acc_norm	85.36	±	0.82
winogrande	0	acc	83.11	±	1.05

Average: 77.73%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	58.87	±	1.72
		mc2	74.62	±	1.44

Average: 74.62%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	60.00	±	3.56
bigbench_date_understanding	0	multiple_choice_grade	63.14	±	2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	47.67	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	22.56	±	2.21
		exact_str_match	0.84	±	0.48
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	33.20	±	2.11
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	23.00	±	1.59
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	59.67	±	2.84
bigbench_movie_recommendation	0	multiple_choice_grade	47.40	±	2.24
bigbench_navigate	0	multiple_choice_grade	56.10	±	1.57
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	71.25	±	1.01
bigbench_ruin_names	0	multiple_choice_grade	56.47	±	2.35
bigbench_salient_translation_error_detection	0	multiple_choice_grade	35.27	±	1.51
bigbench_snarks	0	multiple_choice_grade	73.48	±	3.29
bigbench_sports_understanding	0	multiple_choice_grade	75.46	±	1.37
bigbench_temporal_sequences	0	multiple_choice_grade	52.10	±	1.58
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.64	±	1.18
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	19.83	±	0.95
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	59.67	±	2.84

Average: 48.83%

Average score: 61.58%

Elapsed time: 02:37:39

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	76.13
AI2 Reasoning Challenge (25-Shot)	73.55
HellaSwag (10-Shot)	89.11
MMLU (5-Shot)	64.91
TruthfulQA (0-shot)	74.00
Winogrande (5-shot)	85.56
GSM8k (5-shot)	69.67