metadata

license: apache-2.0
library_name: transformers

Laser-Dolphin-Mixtral-2x7b-dpo

New Version will be uploaded soon

Credit to Fernando Fernandes and Eric Hartford for their project laserRMT

This model is a medium-sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser

A 2x7b configuration offers better performance than a standard 7b model even if loaded in 4 bit. (9G VRAM)

If this 2x7b model is loaded in 4 bit the hellaswag score is .8270 which is higher than the base model achieves on its own in full precision.

The process is outlined in this notebook

These Quants will result in unpredicted behavior and I am working on new Quants as I have updated the model

Quatizations provided by TheBloke

Code Example

Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.

    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    # Tokenize the input prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Generate output tokens
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    # Decode the generated tokens to a string
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

# Load the model and tokenizer
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

prompt = "Write a quicksort algorithm in python"

# Generate and print responses for each language
print("Response:")
print(generate_response(prompt), "\n")

colab with usage example

Eval

evaluation colab

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
laser-dolphin-mixtral-2x7b-dpo	41.31	73.67	61.69	42.79	54.87

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	22.44	±	2.62
		acc_norm	21.26	±	2.57
agieval_logiqa_en	0	acc	34.87	±	1.87
		acc_norm	35.79	±	1.88
agieval_lsat_ar	0	acc	22.17	±	2.75
		acc_norm	23.04	±	2.78
agieval_lsat_lr	0	acc	43.14	±	2.20
		acc_norm	45.10	±	2.21
agieval_lsat_rc	0	acc	57.25	±	3.02
		acc_norm	55.76	±	3.03
agieval_sat_en	0	acc	71.84	±	3.14
		acc_norm	71.84	±	3.14
agieval_sat_en_without_passage	0	acc	44.17	±	3.47
		acc_norm	41.75	±	3.44
agieval_sat_math	0	acc	40.91	±	3.32
		acc_norm	35.91	±	3.24

Average: 41.31%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	58.02	±	1.44
		acc_norm	60.58	±	1.43
arc_easy	0	acc	85.48	±	0.72
		acc_norm	82.62	±	0.78
boolq	1	acc	87.16	±	0.59
hellaswag	0	acc	65.04	±	0.48
		acc_norm	83.63	±	0.37
openbookqa	0	acc	35.60	±	2.14
		acc_norm	45.00	±	2.23
piqa	0	acc	81.99	±	0.90
		acc_norm	83.51	±	0.87
winogrande	0	acc	73.16	±	1.25

Average: 73.67%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	44.31	±	1.74
		mc2	61.69	±	1.50

Average: 61.69%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	59.47	±	3.57
bigbench_date_understanding	0	multiple_choice_grade	66.67	±	2.46
bigbench_disambiguation_qa	0	multiple_choice_grade	36.05	±	3.00
bigbench_geometric_shapes	0	multiple_choice_grade	20.33	±	2.13
		exact_str_match	7.52	±	1.39
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	27.80	±	2.01
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	19.86	±	1.51
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	48.67	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	49.60	±	2.24
bigbench_navigate	0	multiple_choice_grade	53.20	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	68.50	±	1.04
bigbench_ruin_names	0	multiple_choice_grade	41.74	±	2.33
bigbench_salient_translation_error_detection	0	multiple_choice_grade	16.23	±	1.17
bigbench_snarks	0	multiple_choice_grade	64.09	±	3.58
bigbench_sports_understanding	0	multiple_choice_grade	70.69	±	1.45
bigbench_temporal_sequences	0	multiple_choice_grade	37.70	±	1.53
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	23.44	±	1.20
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.60	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	48.67	±	2.89

Average: 42.79%

Average score: 54.87%

Elapsed time: 02:53:28

Citations

Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.

@article{sharma2023truth,
title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
journal={arXiv preprint arXiv:2312.13558},
year={2023} }

@article{gao2021framework,
  title={A framework for few-shot language model evaluation},
  author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
  journal={Version v0. 0.1. Sept},
  year={2021}
}