metadata

language:
  - en
license: cc-by-nc-4.0
tags:
  - mixtral
datasets:
  - Open-Orca/SlimOrca
  - lemonilia/LimaRP
  - chargoddard/rpguild
  - chargoddard/summarize_from_feedback_alpaca
  - HuggingFaceH4/no_robots
  - chargoddard/coedit-reworded
base_model: mistralai/Mixtral-8x7B-v0.1
model-index:
  - name: MixtralRPChat-ZLoss
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 68.6
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 86.1
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 70.44
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 53.85
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 82
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 50.57
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
          name: Open LLM Leaderboard

QLoRA tuned from mistralai/Mixtral-8x7B-v0.1.

My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in ST-MoE: Designing Stable and Transferable Sparse Expert Models. The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.

To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in MegaBlocks. The config used with my custom hacked-up branch of axolotl is available here.

Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with " ***System:", " ***Query:", or " ***Response:" for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	68.59
AI2 Reasoning Challenge (25-Shot)	68.60
HellaSwag (10-Shot)	86.10
MMLU (5-Shot)	70.44
TruthfulQA (0-shot)	53.85
Winogrande (5-shot)	82.00
GSM8k (5-shot)	50.57