--- license: other pipeline_tag: text-generation license_name: microsoft-research-license model-index: - name: Orca-2-13b-Alpaca-Uncensored results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 61.09 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 79.27 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 60.13 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 53.59 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 77.43 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 38.29 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=athirdpath/Orca-2-13b-Alpaca-Uncensored name: Open LLM Leaderboard --- This model is a fine-tuned version of microsoft/Orca-2-13b on a subset of the Vezora/Mini_Orca_Uncencored_Alpaca dataset, adjusted to demonstrate the relationship between instruction and input, with some particularly spicy prompts added to reduce the risk of rejections. Only the q_proj and k_proj modules were targeted and a low rank (8) was used, in hopes of containing the adjustments to the prompt format and alignment. This is promising on paper, with the training's per-step loss averaging <0.9 for the last third of the run. Reasoning stayed solid (for a 13b model) and I consider this a success. Performance is slighty worse than OG Orca-2 in Ooba's chat mode, comparable in Alpaca chat-instruct mode to the OG in ChatLM chat-instruct mode. May still reject some shocking prompts, but can easily be overcome with author's note or character card. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_athirdpath__Orca-2-13b-Alpaca-Uncensored) | Metric |Value| |---------------------------------|----:| |Avg. |61.63| |AI2 Reasoning Challenge (25-Shot)|61.09| |HellaSwag (10-Shot) |79.27| |MMLU (5-Shot) |60.13| |TruthfulQA (0-shot) |53.59| |Winogrande (5-shot) |77.43| |GSM8k (5-shot) |38.29|