leaderboard-pr-bot's picture
Adding Evaluation Results
54ae5ac verified
|
raw
history blame
4.58 kB
metadata
license: other
metrics:
  - accuracy
base_model: Mihaiii/Pallas-0.5
inference: false
license_name: yi-license
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
model-index:
  - name: Pallas-0.5-frankenmerge
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 61.77
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 80.36
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 67.62
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 54.07
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 77.74
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 24.11
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mihaiii/Pallas-0.5-frankenmerge
          name: Open LLM Leaderboard

This is a frankenmerge of Mihaiii/Pallas-0.5 . It was done using mergekit.

It works well with long system prompts.

It isn't generic in a sense that it shouldn't be used for story telling, for example, but only for reasoning and text comprehension.

This model is trained on a private dataset.

Prompt Format:

SYSTEM: <ANY SYSTEM CONTEXT>
USER: 
ASSISTANT:

Merge config:

slices:
  - sources:
    - model: "Mihaiii/Pallas-0.5"
      layer_range: [0, 60]
  - sources:
    - model: "Mihaiii/Pallas-0.5"
      layer_range: [58, 60]
  - sources:
    - model: "Mihaiii/Pallas-0.5"
      layer_range: [55, 56]
merge_method: passthrough
dtype: bfloat16

Quants:

TheBloke/Pallas-0.5-frankenmerge-GGUF

TheBloke/Pallas-0.5-frankenmerge-GPTQ

TheBloke/Pallas-0.5-frankenmerge-AWQ

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 60.95
AI2 Reasoning Challenge (25-Shot) 61.77
HellaSwag (10-Shot) 80.36
MMLU (5-Shot) 67.62
TruthfulQA (0-shot) 54.07
Winogrande (5-shot) 77.74
GSM8k (5-shot) 24.11