Edit model card

OmniCorso-7B

image/webp

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("macadeliccc/OmniCorso-7B")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/OmniCorso-7B")

messages = [
    {"role": "system", "content": "Respond to the users request like a pirate"},
    {"role": "user", "content": "Can you write me a quicksort algorithm?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
      - model: mlabonne/OmniBeagle-7B
        layer_range: [0, 32]
      - model: macadeliccc/MBX-7B-v3-DPO
        layer_range: [0, 32]
merge_method: slerp
base_model: macadeliccc/MBX-7B-v3-DPO
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Quantizations

GGUF

Exllamav2

Quants are available thanks to user bartowski, check them out here

Branch Bits lm_head bits VRAM (4k) VRAM (16k) VRAM (32k) Description
8_0 8.0 8.0 8.4 GB 9.8 GB 11.8 GB Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5 6.5 8.0 7.2 GB 8.6 GB 10.6 GB Very similar to 8.0, good tradeoff of size vs performance, recommended.
5_0 5.0 6.0 6.0 GB 7.4 GB 9.4 GB Slightly lower quality vs 6.5, but usable on 8GB cards.
4_25 4.25 6.0 5.3 GB 6.7 GB 8.7 GB GPTQ equivalent bits per weight, slightly higher quality.
3_5 3.5 6.0 4.7 GB 6.1 GB 8.1 GB Lower quality, only use if you have to.

Evaluations

----Benchmark Complete----
2024-02-11 15:34:40
Time taken: 178.3 mins
Prompt Format: ChatML
Model: macadeliccc/OmniCorso-7B
Score (v2): 73.75
Parseable: 167.0
---------------
Batch completed
Time taken: 178.3 mins
---------------
Model AGIEval GPT4All TruthfulQA Bigbench Average
OmniCorso-7B 45.89 77.66 74.12 49.24 61.73

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.13 ± 2.86
acc_norm 27.17 ± 2.80
agieval_logiqa_en 0 acc 39.32 ± 1.92
acc_norm 39.63 ± 1.92
agieval_lsat_ar 0 acc 23.91 ± 2.82
acc_norm 23.91 ± 2.82
agieval_lsat_lr 0 acc 53.14 ± 2.21
acc_norm 53.92 ± 2.21
agieval_lsat_rc 0 acc 66.54 ± 2.88
acc_norm 67.29 ± 2.87
agieval_sat_en 0 acc 80.58 ± 2.76
acc_norm 80.58 ± 2.76
agieval_sat_en_without_passage 0 acc 45.63 ± 3.48
acc_norm 43.69 ± 3.46
agieval_sat_math 0 acc 33.18 ± 3.18
acc_norm 30.91 ± 3.12

Average: 45.89%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 67.32 ± 1.37
acc_norm 68.43 ± 1.36
arc_easy 0 acc 87.46 ± 0.68
acc_norm 83.50 ± 0.76
boolq 1 acc 88.13 ± 0.57
hellaswag 0 acc 68.47 ± 0.46
acc_norm 86.96 ± 0.34
openbookqa 0 acc 38.80 ± 2.18
acc_norm 50.00 ± 2.24
piqa 0 acc 83.03 ± 0.88
acc_norm 85.31 ± 0.83
winogrande 0 acc 81.29 ± 1.10

Average: 77.66%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 58.26 ± 1.73
mc2 74.12 ± 1.43

Average: 74.12%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 56.84 ± 3.60
bigbench_date_understanding 0 multiple_choice_grade 63.41 ± 2.51
bigbench_disambiguation_qa 0 multiple_choice_grade 49.22 ± 3.12
bigbench_geometric_shapes 0 multiple_choice_grade 23.96 ± 2.26
exact_str_match 1.39 ± 0.62
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 34.20 ± 2.12
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.71 ± 1.61
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 60.33 ± 2.83
bigbench_movie_recommendation 0 multiple_choice_grade 49.00 ± 2.24
bigbench_navigate 0 multiple_choice_grade 55.20 ± 1.57
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.75 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 55.80 ± 2.35
bigbench_salient_translation_error_detection 0 multiple_choice_grade 36.97 ± 1.53
bigbench_snarks 0 multiple_choice_grade 72.38 ± 3.33
bigbench_sports_understanding 0 multiple_choice_grade 76.27 ± 1.36
bigbench_temporal_sequences 0 multiple_choice_grade 54.50 ± 1.58
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 23.12 ± 1.19
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 20.34 ± 0.96
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 60.33 ± 2.83

Average: 49.24%

Average score: 61.73%

Elapsed time: 02:20:06

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 75.74
AI2 Reasoning Challenge (25-Shot) 72.70
HellaSwag (10-Shot) 88.70
MMLU (5-Shot) 64.91
TruthfulQA (0-shot) 73.43
Winogrande (5-shot) 83.74
GSM8k (5-shot) 70.96
Downloads last month
435
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of

Evaluation results