language:
- en
license: cc-by-nc-4.0
tags:
- mixtral
datasets:
- Open-Orca/SlimOrca
- lemonilia/LimaRP
- chargoddard/rpguild
- chargoddard/summarize_from_feedback_alpaca
- HuggingFaceH4/no_robots
- chargoddard/coedit-reworded
base_model: mistralai/Mixtral-8x7B-v0.1
model-index:
- name: MixtralRPChat-ZLoss
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 68.6
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 86.1
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 70.44
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 53.85
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 82
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 50.57
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/MixtralRPChat-ZLoss
name: Open LLM Leaderboard
QLoRA tuned from mistralai/Mixtral-8x7B-v0.1.
My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in ST-MoE: Designing Stable and Transferable Sparse Expert Models. The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.
To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in MegaBlocks. The config used with my custom hacked-up branch of axolotl is available here.
Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with " ***System:"
, " ***Query:"
, or " ***Response:"
for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 68.59 |
AI2 Reasoning Challenge (25-Shot) | 68.60 |
HellaSwag (10-Shot) | 86.10 |
MMLU (5-Shot) | 70.44 |
TruthfulQA (0-shot) | 53.85 |
Winogrande (5-shot) | 82.00 |
GSM8k (5-shot) | 50.57 |