--- language: - en license: apache-2.0 datasets: - Open-Orca/SlimOrca base_model: mistralai/Mistral-7B-v0.1 model-index: - name: mistral-11b-slimorca results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 64.25 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 83.81 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 63.66 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 54.66 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 77.98 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 52.39 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=chargoddard/mistral-11b-slimorca name: Open LLM Leaderboard --- Full weight fine tuned on two epochs of [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca). Uses Mistral Instruct's prompt format. The base model for this came from a variation on Undi's [Mistral 11B recipe](https://huggingface.co/Undi95/Mistral-11B-v0.1). The `o_proj` and `down_proj` tensors were set to zero in the added layers, making the output exactly identical to Mistral 7B before training. ~Benchmarks look good locally but still evaluating actual usefulness.~ Update: this turned out great! 10/10 would recommend as a training approach. ### Reproducing This [mergekit](https://github.com/cg123/mergekit) config was used to produce the base model: ```yml slices: - sources: - model: mistralai/Mistral-7B-v0.1 layer_range: [0, 24] - sources: # add middle layers with residuals scaled to zero - model: mistralai/Mistral-7B-v0.1 layer_range: [8, 24] parameters: scale: - filter: o_proj value: 0.0 - filter: down_proj value: 0.0 - value: 1.0 - sources: - model: mistralai/Mistral-7B-v0.1 layer_range: [24, 32] merge_method: passthrough dtype: bfloat16 ``` The axolotl config for fine tuning is available [here](https://huggingface.co/chargoddard/mistral-11b-slimorca/blob/main/axolotl_config.yaml). # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_chargoddard__mistral-11b-slimorca) | Metric |Value| |---------------------------------|----:| |Avg. |66.12| |AI2 Reasoning Challenge (25-Shot)|64.25| |HellaSwag (10-Shot) |83.81| |MMLU (5-Shot) |63.66| |TruthfulQA (0-shot) |54.66| |Winogrande (5-shot) |77.98| |GSM8k (5-shot) |52.39|