Edit model card

Laser-Dolphin-Mixtral-2x7b-dpo

laser_dolphin_image

New Version out now!

Credit to Fernando Fernandes and Eric Hartford for their project laserRMT

Overview

This model is a medium-sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser

  • The new version shows ~1 point increase in evaluation performance on average.

Process

  • The process is outlined in this notebook

  • The mergekit_config is in the files.

  • The models used in the configuration are not lasered, but the final product is. This is an update from the last version.

  • This process is experimental. Your mileage may vary.

Future Goals

  • Function Calling
  • v2 with new base model to improve performance

Quantizations

ExLlamav2

These are the recommended quantizations for users that are running the model on GPU

Thanks to user bartowski we now have exllamav2 quantizations in 3.5 through 8 bpw. They are available here:

Branch Bits lm_head bits VRAM (4k) VRAM (16k) VRAM (32k) Description
8_0 8.0 8.0 13.7 GB 15.1 GB 17.2 GB Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5 6.5 8.0 11.5 GB 12.9 GB 15.0 GB Near unquantized performance at vastly reduced size, recommended.
5_0 5.0 6.0 9.3 GB 10.7 GB 12.8 GB Slightly lower quality vs 6.5, great for 12gb cards with 16k context.
4_25 4.25 6.0 8.2 GB 9.6 GB 11.7 GB GPTQ equivalent bits per weight.
3_5 3.5 6.0 7.0 GB 8.4 GB 10.5 GB Lower quality, not recommended.

His quantizations represent the first ~13B model with GQA support. Check out his repo for more information!

GGUF

Current GGUF Quantizations

AWQ

*Current AWQ Quantizations

TheBloke

These Quants will result in unpredicted behavior. New quants are available as I have updated the model

Quatizations provided by TheBloke

HF Spaces

  • GGUF chat available here
  • 4-bit bnb chat available here

Ollama

ollama run macadeliccc/laser-dolphin-mixtral-2x7b-dpo

image/png

Code Example

Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.

    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    # Tokenize the input prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Generate output tokens
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    # Decode the generated tokens to a string
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

# Load the model and tokenizer
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)

prompt = "Write a quicksort algorithm in python"

# Generate and print responses for each language
print("Response:")
print(generate_response(prompt), "\n")

colab with usage example

Eval

EQ Bench

----Benchmark Complete----
2024-01-31 16:55:37
Time taken: 31.1 mins
Prompt Format: ChatML
Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo-GGUF
Score (v2): 72.76
Parseable: 171.0
---------------
Batch completed
Time taken: 31.2 mins
---------------

evaluation colab

Summary of previous evaluation

Model AGIEval GPT4All TruthfulQA Bigbench Average
laser-dolphin-mixtral-2x7b-dpo 41.31 73.67 61.69 42.79 54.87

Detailed current evaluation

Model AGIEval GPT4All TruthfulQA Bigbench Average
laser-dolphin-mixtral-2x7b-dpo 42.25 73.45 63.44 43.96 55.77

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.26 Β± 2.57
acc_norm 21.65 Β± 2.59
agieval_logiqa_en 0 acc 34.72 Β± 1.87
acc_norm 35.64 Β± 1.88
agieval_lsat_ar 0 acc 26.96 Β± 2.93
acc_norm 26.96 Β± 2.93
agieval_lsat_lr 0 acc 45.88 Β± 2.21
acc_norm 46.08 Β± 2.21
agieval_lsat_rc 0 acc 59.48 Β± 3.00
acc_norm 59.48 Β± 3.00
agieval_sat_en 0 acc 73.79 Β± 3.07
acc_norm 73.79 Β± 3.07
agieval_sat_en_without_passage 0 acc 42.23 Β± 3.45
acc_norm 41.26 Β± 3.44
agieval_sat_math 0 acc 37.27 Β± 3.27
acc_norm 33.18 Β± 3.18

Average: 42.25%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 58.36 Β± 1.44
acc_norm 58.02 Β± 1.44
arc_easy 0 acc 82.20 Β± 0.78
acc_norm 77.40 Β± 0.86
boolq 1 acc 87.52 Β± 0.58
hellaswag 0 acc 67.50 Β± 0.47
acc_norm 84.43 Β± 0.36
openbookqa 0 acc 34.40 Β± 2.13
acc_norm 47.00 Β± 2.23
piqa 0 acc 81.61 Β± 0.90
acc_norm 82.59 Β± 0.88
winogrande 0 acc 77.19 Β± 1.18

Average: 73.45%

GSM8K

Task Version Metric Value Stderr
gsm8k 2 exact_match,get-answer 0.75
exact_match_stderr,get-answer 0.01
alias gsm8k

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 45.90 Β± 1.74
mc2 63.44 Β± 1.56

Average: 63.44%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 58.42 Β± 3.59
bigbench_date_understanding 0 multiple_choice_grade 60.70 Β± 2.55
bigbench_disambiguation_qa 0 multiple_choice_grade 38.37 Β± 3.03
bigbench_geometric_shapes 0 multiple_choice_grade 21.73 Β± 2.18
exact_str_match 0.00 Β± 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 35.00 Β± 2.14
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.57 Β± 1.61
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 50.33 Β± 2.89
bigbench_movie_recommendation 0 multiple_choice_grade 45.00 Β± 2.23
bigbench_navigate 0 multiple_choice_grade 50.00 Β± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 60.35 Β± 1.09
bigbench_ruin_names 0 multiple_choice_grade 51.12 Β± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 32.26 Β± 1.48
bigbench_snarks 0 multiple_choice_grade 67.96 Β± 3.48
bigbench_sports_understanding 0 multiple_choice_grade 70.59 Β± 1.45
bigbench_temporal_sequences 0 multiple_choice_grade 35.80 Β± 1.52
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.56 Β± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.20 Β± 0.90
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 50.33 Β± 2.89

Average: 43.96%

Average score: 55.77%

Elapsed time: 02:43:45

Citations

Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.

@article{sharma2023truth,
title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
journal={arXiv preprint arXiv:2312.13558},
year={2023} }
@article{gao2021framework,
  title={A framework for few-shot language model evaluation},
  author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
  journal={Version v0. 0.1. Sept},
  year={2021}
}

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 67.16
AI2 Reasoning Challenge (25-Shot) 65.96
HellaSwag (10-Shot) 85.80
MMLU (5-Shot) 63.17
TruthfulQA (0-shot) 60.76
Winogrande (5-shot) 79.01
GSM8k (5-shot) 48.29
Downloads last month
5,171
Safetensors
Model size
12.9B params
Tensor type
BF16
Β·

Spaces using macadeliccc/laser-dolphin-mixtral-2x7b-dpo 10

Collection including macadeliccc/laser-dolphin-mixtral-2x7b-dpo

Evaluation results