--- language: - en - ko license: cc-by-sa-4.0 tags: - not-for-all-audiences datasets: - maywell/ko_wikidata_QA - kyujinpy/OpenOrca-KO - Anthropic/hh-rlhf pipeline_tag: text-generation model-index: - name: PiVoT-0.1-Evil-a results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 59.64 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 81.48 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 58.94 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 39.23 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 75.3 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 40.41 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a name: Open LLM Leaderboard --- # PiVoT-0.1-early ![image/png](./PiVoT.png) # **Model Details** ### Description PivoT is Finetuned model based on Mistral 7B. It is variation from Synatra v0.3 RP which has shown decent performance. PiVoT-0.1-Evil-**a** is an **Evil tuned** Version of PiVoT. It finetuned by method below. PiVot-0.1-Evil-**b** has Noisy Embedding tuned. It would have more variety in results. ![image/png](./eviltune.png) ## Prompt template: Alpaca-InstructOnly2 ``` ### Instruction: {prompt} ### Response: ``` ### Disclaimer The AI model provided herein is intended for experimental purposes only. The creator of this model makes no representations or warranties of any kind, either express or implied, as to the model's accuracy, reliability, or suitability for any particular purpose. The creator shall not be held liable for any outcomes, decisions, or actions taken on the basis of the information generated by this model. Users of this model assume full responsibility for any consequences resulting from its use. OpenOrca Dataset used when finetune PiVoT variation. Arcalive Ai Chat Chan log 7k, [ko_wikidata_QA](https://huggingface.co/datasets/maywell/ko_wikidata_QA), [kyujinpy/OpenOrca-KO](https://huggingface.co/datasets/kyujinpy/OpenOrca-KO) and other datasets used on base model. Follow me on twitter: https://twitter.com/stablefluffy Consider Support me making these model alone: https://www.buymeacoffee.com/mwell or with Runpod Credit Gift 💕 Contact me on Telegram: https://t.me/AlzarTakkarsen # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maywell__PiVoT-0.1-Evil-a) | Metric |Value| |---------------------------------|----:| |Avg. |59.16| |AI2 Reasoning Challenge (25-Shot)|59.64| |HellaSwag (10-Shot) |81.48| |MMLU (5-Shot) |58.94| |TruthfulQA (0-shot) |39.23| |Winogrande (5-shot) |75.30| |GSM8k (5-shot) |40.41|