leaderboard-pr-bot's picture
Adding Evaluation Results
205c3dc verified
metadata
language:
  - en
license: apache-2.0
tags:
  - dare
  - super mario merge
  - pytorch
  - solar
  - merge
pipeline_tag: text-generation
inference: false
model-index:
  - name: solar-megamerge-dare-10.7b-v1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 66.13
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 85.3
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 66.03
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 54.33
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 82.95
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 58
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=martyn/solar-megamerge-dare-10.7b-v1
          name: Open LLM Leaderboard

solar megamerge 10.7b

The following models were merged with DARE using https://github.com/martyn/safetensors-merge-supermario

Mergelist

models:
  - model: upstage/SOLAR-10.7B-v1.0
  - model: upstage/SOLAR-10.7B-Instruct-v1.0
    parameters:
      weight: 0.20
      density: 0.8
  - model: kyujinpy/SOLAR-Platypus-10.7B-v1
    parameters:
      weight: 0.19
      density: 0.75
  - model: We-Want-GPU/SOLAR-10.7B-orca-alpaca-gpt4-math
    parameters:
      weight: 0.18
      density: 0.75
  - model: maywell/Synatra-10.7B-v0.4
    parameters:
      weight: 0.18
      density: 0.7
  - model: kyujinpy/SOLAR-Platypus-10.7B-v2
    parameters:
      weight: 0.17
      density: 0.7
  - model: Sao10K/Frostwind-10.7B-v1
    parameters:
      weight: 0.16
      density: 0.65
  - model: rishiraj/meow
    parameters:
      weight: 0.15
      density: 0.6

Merge command

python3 hf_merge.py mergelist.yaml solar-1

Notes

  • in the yaml: p=weight and lambda=1/density

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 68.79
AI2 Reasoning Challenge (25-Shot) 66.13
HellaSwag (10-Shot) 85.30
MMLU (5-Shot) 66.03
TruthfulQA (0-shot) 54.33
Winogrande (5-shot) 82.95
GSM8k (5-shot) 58.00