metadata

license: apache-2.0
library_name: transformers

Laser-Dolphin-Mixtral-2x7b-dpo

New Version out now!

Credit to Fernando Fernandes and Eric Hartford for their project laserRMT

Overview

This model is a medium-sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser

The new version shows ~1 point on average.

Process

The process is outlined in this notebook
The mergekit_config is in the files.
The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
This process is experimental. Your mileage may vary.

Future Goals

Function Calling
v2 with new base model to improve performance

Quantizations

These Quants will result in unpredicted behavior. New quants are available as I have updated the model

Quatizations provided by TheBloke

Current Quantizations

HF Spaces

GGUF chat available here
4-bit bnb chat available here

Code Example

Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.

    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    # Tokenize the input prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Generate output tokens
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    # Decode the generated tokens to a string
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

# Load the model and tokenizer
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

prompt = "Write a quicksort algorithm in python"

# Generate and print responses for each language
print("Response:")
print(generate_response(prompt), "\n")

colab with usage example

Eval

EQ Bench

----Benchmark Complete----
2024-01-31 16:55:37
Time taken: 31.1 mins
Prompt Format: ChatML
Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF
Score (v2): 72.76
Parseable: 171.0
---------------
Batch completed
Time taken: 31.2 mins
---------------

evaluation colab

Summary of previous evaluation

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
laser-dolphin-mixtral-2x7b-dpo	41.31	73.67	61.69	42.79	54.87

Detailed current evaluation

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
laser-dolphin-mixtral-2x7b-dpo	42.25	73.45	63.44	43.96	55.77

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	21.26	±	2.57
		acc_norm	21.65	±	2.59
agieval_logiqa_en	0	acc	34.72	±	1.87
		acc_norm	35.64	±	1.88
agieval_lsat_ar	0	acc	26.96	±	2.93
		acc_norm	26.96	±	2.93
agieval_lsat_lr	0	acc	45.88	±	2.21
		acc_norm	46.08	±	2.21
agieval_lsat_rc	0	acc	59.48	±	3.00
		acc_norm	59.48	±	3.00
agieval_sat_en	0	acc	73.79	±	3.07
		acc_norm	73.79	±	3.07
agieval_sat_en_without_passage	0	acc	42.23	±	3.45
		acc_norm	41.26	±	3.44
agieval_sat_math	0	acc	37.27	±	3.27
		acc_norm	33.18	±	3.18

Average: 42.25%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	58.36	±	1.44
		acc_norm	58.02	±	1.44
arc_easy	0	acc	82.20	±	0.78
		acc_norm	77.40	±	0.86
boolq	1	acc	87.52	±	0.58
hellaswag	0	acc	67.50	±	0.47
		acc_norm	84.43	±	0.36
openbookqa	0	acc	34.40	±	2.13
		acc_norm	47.00	±	2.23
piqa	0	acc	81.61	±	0.90
		acc_norm	82.59	±	0.88
winogrande	0	acc	77.19	±	1.18

Average: 73.45%

GSM8K

Task	Version	Metric	Value
gsm8k	2	exact_match,get-answer	0.75
		exact_match_stderr,get-answer	0.01
		alias	gsm8k

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	45.90	±	1.74
		mc2	63.44	±	1.56

Average: 63.44%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	58.42	±	3.59
bigbench_date_understanding	0	multiple_choice_grade	60.70	±	2.55
bigbench_disambiguation_qa	0	multiple_choice_grade	38.37	±	3.03
bigbench_geometric_shapes	0	multiple_choice_grade	21.73	±	2.18
		exact_str_match	0.00	±	0.00
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	35.00	±	2.14
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	23.57	±	1.61
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	50.33	±	2.89
bigbench_movie_recommendation	0	multiple_choice_grade	45.00	±	2.23
bigbench_navigate	0	multiple_choice_grade	50.00	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	60.35	±	1.09
bigbench_ruin_names	0	multiple_choice_grade	51.12	±	2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	32.26	±	1.48
bigbench_snarks	0	multiple_choice_grade	67.96	±	3.48
bigbench_sports_understanding	0	multiple_choice_grade	70.59	±	1.45
bigbench_temporal_sequences	0	multiple_choice_grade	35.80	±	1.52
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.56	±	1.18
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.20	±	0.90
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	50.33	±	2.89

Average: 43.96%

Average score: 55.77%

Elapsed time: 02:43:45

Citations

Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.

@article{sharma2023truth,
title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
journal={arXiv preprint arXiv:2312.13558},
year={2023} }

@article{gao2021framework,
  title={A framework for few-shot language model evaluation},
  author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
  journal={Version v0. 0.1. Sept},
  year={2021}
}