license: apache-2.0
library_name: transformers
Laser-Dolphin-Mixtral-2x7b-dpo
New Version out now!
Credit to Fernando Fernandes and Eric Hartford for their project laserRMT
Overview
This model is a medium-sized MoE implementation based on cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser
- The new version shows ~1 point on average.
Process
The process is outlined in this notebook
The mergekit_config is in the files.
The models used in the configuration are not lasered, but the final product is. This is an update from the last version.
This process is experimental. Your mileage may vary.
Quantizations
These Quants will result in unpredicted behavior. New quants are available as I have updated the model
Quatizations provided by TheBloke
Current Quantizations
- Q4_K_M
- Q5_K_M
HF Spaces
Code Example
Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
from transformers import AutoModelForCausalLM, AutoTokenizer
def generate_response(prompt):
"""
Generate a response from the model based on the input prompt.
Args:
prompt (str): Prompt for the model.
Returns:
str: The generated response from the model.
"""
# Tokenize the input prompt
inputs = tokenizer(prompt, return_tensors="pt")
# Generate output tokens
outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
# Decode the generated tokens to a string
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
# Load the model and tokenizer
model_id = "macadeliccc/laser-dolphin-mixtral-2x7b-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
prompt = "Write a quicksort algorithm in python"
# Generate and print responses for each language
print("Response:")
print(generate_response(prompt), "\n")
colab with usage example
Eval
EQ Bench
- Evaluated in 4bit ----Benchmark Complete----
- 2024-01-24 16:15:48
- Time taken: 67.3 mins
- Prompt Format: Mistral
- Model: macadeliccc/laser-dolphin-mixtral-2x7b-dpo
- Score (v2): 71.59 +sParseable: 168.0
Batch completed Time taken: 67.3 mins
evaluation colab
Summary of previous evaluation
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
laser-dolphin-mixtral-2x7b-dpo | 41.31 | 73.67 | 61.69 | 42.79 | 54.87 |
Detailed current evaluation
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
laser-dolphin-mixtral-2x7b-dpo | 42.25 | 73.45 | 63.44 | 43.96 | 55.77 |
AGIEval
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
agieval_aqua_rat | 0 | acc | 21.26 | ± | 2.57 |
acc_norm | 21.65 | ± | 2.59 | ||
agieval_logiqa_en | 0 | acc | 34.72 | ± | 1.87 |
acc_norm | 35.64 | ± | 1.88 | ||
agieval_lsat_ar | 0 | acc | 26.96 | ± | 2.93 |
acc_norm | 26.96 | ± | 2.93 | ||
agieval_lsat_lr | 0 | acc | 45.88 | ± | 2.21 |
acc_norm | 46.08 | ± | 2.21 | ||
agieval_lsat_rc | 0 | acc | 59.48 | ± | 3.00 |
acc_norm | 59.48 | ± | 3.00 | ||
agieval_sat_en | 0 | acc | 73.79 | ± | 3.07 |
acc_norm | 73.79 | ± | 3.07 | ||
agieval_sat_en_without_passage | 0 | acc | 42.23 | ± | 3.45 |
acc_norm | 41.26 | ± | 3.44 | ||
agieval_sat_math | 0 | acc | 37.27 | ± | 3.27 |
acc_norm | 33.18 | ± | 3.18 |
Average: 42.25%
GPT4All
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_challenge | 0 | acc | 58.36 | ± | 1.44 |
acc_norm | 58.02 | ± | 1.44 | ||
arc_easy | 0 | acc | 82.20 | ± | 0.78 |
acc_norm | 77.40 | ± | 0.86 | ||
boolq | 1 | acc | 87.52 | ± | 0.58 |
hellaswag | 0 | acc | 67.50 | ± | 0.47 |
acc_norm | 84.43 | ± | 0.36 | ||
openbookqa | 0 | acc | 34.40 | ± | 2.13 |
acc_norm | 47.00 | ± | 2.23 | ||
piqa | 0 | acc | 81.61 | ± | 0.90 |
acc_norm | 82.59 | ± | 0.88 | ||
winogrande | 0 | acc | 77.19 | ± | 1.18 |
Average: 73.45%
GSM8K
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
gsm8k | 2 | exact_match,get-answer | 0.75 | ||
exact_match_stderr,get-answer | 0.01 | ||||
alias | gsm8k |
TruthfulQA
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
truthfulqa_mc | 1 | mc1 | 45.90 | ± | 1.74 |
mc2 | 63.44 | ± | 1.56 |
Average: 63.44%
Bigbench
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
bigbench_causal_judgement | 0 | multiple_choice_grade | 58.42 | ± | 3.59 |
bigbench_date_understanding | 0 | multiple_choice_grade | 60.70 | ± | 2.55 |
bigbench_disambiguation_qa | 0 | multiple_choice_grade | 38.37 | ± | 3.03 |
bigbench_geometric_shapes | 0 | multiple_choice_grade | 21.73 | ± | 2.18 |
exact_str_match | 0.00 | ± | 0.00 | ||
bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 35.00 | ± | 2.14 |
bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 23.57 | ± | 1.61 |
bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 50.33 | ± | 2.89 |
bigbench_movie_recommendation | 0 | multiple_choice_grade | 45.00 | ± | 2.23 |
bigbench_navigate | 0 | multiple_choice_grade | 50.00 | ± | 1.58 |
bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 60.35 | ± | 1.09 |
bigbench_ruin_names | 0 | multiple_choice_grade | 51.12 | ± | 2.36 |
bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 32.26 | ± | 1.48 |
bigbench_snarks | 0 | multiple_choice_grade | 67.96 | ± | 3.48 |
bigbench_sports_understanding | 0 | multiple_choice_grade | 70.59 | ± | 1.45 |
bigbench_temporal_sequences | 0 | multiple_choice_grade | 35.80 | ± | 1.52 |
bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 22.56 | ± | 1.18 |
bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 17.20 | ± | 0.90 |
bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 50.33 | ± | 2.89 |
Average: 43.96%
Average score: 55.77%
Elapsed time: 02:43:45
Citations
Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
@article{sharma2023truth,
title={The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction},
author={Sharma, Pratyusha and Ash, Jordan T and Misra, Dipendra},
journal={arXiv preprint arXiv:2312.13558},
year={2023} }
@article{gao2021framework,
title={A framework for few-shot language model evaluation},
author={Gao, Leo and Tow, Jonathan and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and McDonell, Kyle and Muennighoff, Niklas and others},
journal={Version v0. 0.1. Sept},
year={2021}
}