Edit model card

Full weight fine tuned on two epochs of SlimOrca. Uses Mistral Instruct's prompt format.

The base model for this came from a variation on Undi's Mistral 11B recipe. The o_proj and down_proj tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training.

Benchmarks look good locally but still evaluating actual usefulness. Update: this turned out great! 10/10 would recommend as a training approach.

Reproducing

This mergekit config was used to produce the base model:

slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 24]
  - sources: # add middle layers with residuals scaled to zero
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [8, 24]
        parameters:
          scale:
            - filter: o_proj
              value: 0.0
            - filter: down_proj
              value: 0.0
            - value: 1.0
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

The axolotl config for fine tuning is available here.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 66.12
AI2 Reasoning Challenge (25-Shot) 64.25
HellaSwag (10-Shot) 83.81
MMLU (5-Shot) 63.66
TruthfulQA (0-shot) 54.66
Winogrande (5-shot) 77.98
GSM8k (5-shot) 52.39
Downloads last month
2,936
Safetensors
Model size
10.7B params
Tensor type
BF16
·

Finetuned from

Dataset used to train chargoddard/mistral-11b-slimorca

Evaluation results