metadata

license: cc
tags:
  - mergekit
  - merge
base_model:
  - macadeliccc/MBX-7B-v3-DPO
  - mlabonne/OmniBeagle-7B
model-index:
  - name: OmniCorso-7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 72.7
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 88.7
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.91
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 73.43
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 83.74
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 70.96
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/OmniCorso-7B
          name: Open LLM Leaderboard

OmniCorso-7B

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("macadeliccc/OmniCorso-7B")
model = AutoModelForCausalLM.from_pretrained("macadeliccc/OmniCorso-7B")

messages = [
    {"role": "system", "content": "Respond to the users request like a pirate"},
    {"role": "user", "content": "Can you write me a quicksort algorithm?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
      - model: mlabonne/OmniBeagle-7B
        layer_range: [0, 32]
      - model: macadeliccc/MBX-7B-v3-DPO
        layer_range: [0, 32]
merge_method: slerp
base_model: macadeliccc/MBX-7B-v3-DPO
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Quantizations

GGUF

iMatrix

Exllamav2

Quants are available thanks to user bartowski, check them out here

Branch	Bits	lm_head bits	VRAM (4k)	VRAM (16k)	VRAM (32k)	Description
8_0	8.0	8.0	8.4 GB	9.8 GB	11.8 GB	Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5	6.5	8.0	7.2 GB	8.6 GB	10.6 GB	Very similar to 8.0, good tradeoff of size vs performance, recommended.
5_0	5.0	6.0	6.0 GB	7.4 GB	9.4 GB	Slightly lower quality vs 6.5, but usable on 8GB cards.
4_25	4.25	6.0	5.3 GB	6.7 GB	8.7 GB	GPTQ equivalent bits per weight, slightly higher quality.
3_5	3.5	6.0	4.7 GB	6.1 GB	8.1 GB	Lower quality, only use if you have to.

Evaluations

----Benchmark Complete----
2024-02-11 15:34:40
Time taken: 178.3 mins
Prompt Format: ChatML
Model: macadeliccc/OmniCorso-7B
Score (v2): 73.75
Parseable: 167.0
---------------
Batch completed
Time taken: 178.3 mins
---------------

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
OmniCorso-7B	45.89	77.66	74.12	49.24	61.73

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	29.13	±	2.86
		acc_norm	27.17	±	2.80
agieval_logiqa_en	0	acc	39.32	±	1.92
		acc_norm	39.63	±	1.92
agieval_lsat_ar	0	acc	23.91	±	2.82
		acc_norm	23.91	±	2.82
agieval_lsat_lr	0	acc	53.14	±	2.21
		acc_norm	53.92	±	2.21
agieval_lsat_rc	0	acc	66.54	±	2.88
		acc_norm	67.29	±	2.87
agieval_sat_en	0	acc	80.58	±	2.76
		acc_norm	80.58	±	2.76
agieval_sat_en_without_passage	0	acc	45.63	±	3.48
		acc_norm	43.69	±	3.46
agieval_sat_math	0	acc	33.18	±	3.18
		acc_norm	30.91	±	3.12

Average: 45.89%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	67.32	±	1.37
		acc_norm	68.43	±	1.36
arc_easy	0	acc	87.46	±	0.68
		acc_norm	83.50	±	0.76
boolq	1	acc	88.13	±	0.57
hellaswag	0	acc	68.47	±	0.46
		acc_norm	86.96	±	0.34
openbookqa	0	acc	38.80	±	2.18
		acc_norm	50.00	±	2.24
piqa	0	acc	83.03	±	0.88
		acc_norm	85.31	±	0.83
winogrande	0	acc	81.29	±	1.10

Average: 77.66%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	58.26	±	1.73
		mc2	74.12	±	1.43

Average: 74.12%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	56.84	±	3.60
bigbench_date_understanding	0	multiple_choice_grade	63.41	±	2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	49.22	±	3.12
bigbench_geometric_shapes	0	multiple_choice_grade	23.96	±	2.26
		exact_str_match	1.39	±	0.62
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	34.20	±	2.12
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	23.71	±	1.61
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	60.33	±	2.83
bigbench_movie_recommendation	0	multiple_choice_grade	49.00	±	2.24
bigbench_navigate	0	multiple_choice_grade	55.20	±	1.57
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	70.75	±	1.02
bigbench_ruin_names	0	multiple_choice_grade	55.80	±	2.35
bigbench_salient_translation_error_detection	0	multiple_choice_grade	36.97	±	1.53
bigbench_snarks	0	multiple_choice_grade	72.38	±	3.33
bigbench_sports_understanding	0	multiple_choice_grade	76.27	±	1.36
bigbench_temporal_sequences	0	multiple_choice_grade	54.50	±	1.58
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	23.12	±	1.19
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	20.34	±	0.96
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	60.33	±	2.83

Average: 49.24%

Average score: 61.73%

Elapsed time: 02:20:06

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	75.74
AI2 Reasoning Challenge (25-Shot)	72.70
HellaSwag (10-Shot)	88.70
MMLU (5-Shot)	64.91
TruthfulQA (0-shot)	73.43
Winogrande (5-shot)	83.74
GSM8k (5-shot)	70.96