SOVLish-Devil-8B-L3 / README.md
saishf's picture
Adding Evaluation Results (#2)
78d5349 verified
metadata
license: cc-by-nc-4.0
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - mlabonne/Daredevil-8B-abliterated
  - ResplendentAI/RP_Format_QuoteAsterisk_Llama3
  - mlabonne/Daredevil-8B-abliterated
  - ResplendentAI/BlueMoon_Llama3
  - mlabonne/Daredevil-8B-abliterated
  - ResplendentAI/Luna_Llama3
  - mlabonne/Daredevil-8B-abliterated
  - mlabonne/Daredevil-8B-abliterated
  - ResplendentAI/Aura_Llama3
  - mlabonne/Daredevil-8B-abliterated
  - ResplendentAI/Smarts_Llama3
model-index:
  - name: SOVLish-Devil-8B-L3
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 69.2
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 84.44
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 68.97
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 57.95
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 78.14
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.48
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=saishf/SOVLish-Devil-8B-L3
          name: Open LLM Leaderboard

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

image/png Devil >:3

This is another "SOVL" style merge, this time using mlabonne/Daredevil-8B-abliterated.

Daredevil is the first abliterated model i've tried that feels as smart as base llama-3-instruct while also being willing to give instructions to do all kinda of illegal things

This model should do well in rp, I'm yet to test it (waiting for gguf files @_@)

Merge Method

This model was merged using the Model Stock merge method using mlabonne/Daredevil-8B-abliterated as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: mlabonne/Daredevil-8B-abliterated+ResplendentAI/Aura_Llama3
  - model: mlabonne/Daredevil-8B-abliterated+ResplendentAI/Smarts_Llama3
  - model: mlabonne/Daredevil-8B-abliterated+ResplendentAI/Luna_Llama3
  - model: mlabonne/Daredevil-8B-abliterated+ResplendentAI/BlueMoon_Llama3
  - model: mlabonne/Daredevil-8B-abliterated+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
merge_method: model_stock
base_model: mlabonne/Daredevil-8B-abliterated
dtype: bfloat16

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 71.86
AI2 Reasoning Challenge (25-Shot) 69.20
HellaSwag (10-Shot) 84.44
MMLU (5-Shot) 68.97
TruthfulQA (0-shot) 57.95
Winogrande (5-shot) 78.14
GSM8k (5-shot) 72.48