--- license: other tags: - lora - qlora - adapter license_name: yi-license license_link: LICENSE model-index: - name: Yi-34b-200K-rawrr-v2-run-0902-LoRA results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 64.68 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 84.5 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 75.76 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 46.66 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 81.14 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 62.17 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34b-200K-rawrr-v2-run-0902-LoRA name: Open LLM Leaderboard --- This is not an instruct fine tune, instead it's an attempt to de-contaminate the model, remove gptslop and refusals. I want model to feel like it was trained on human data, not synthetic one. About 961 steps total, Yi-34B-200K llamafied DPO trained for 1 epoch on rawrr_v2 dataset via unsloth qlora at prompt length of 400 and max length of 700, lr 0.000045 \ Model initialized with max_positional_embeddings of 4096 to not OOM. \ Training done on RTX 3090 Ti in about 14 hours. \ Average mem usage was like 23.89 / 23.99 GiB, so very close to OOM at all times. \ I trained it with XFCE on one 1080p monitor loaded up, on more fancy DM it would probably OOM with the same setup. \ I am not sure what's the purpose of max_prompt_length being separate from max_length, so I may have used it wrong, I should read up on it. \ Script I used to do this fine-tune is in the repo. I used chatml prompt format. Now I plan to fine-tune this on AEZAKMI v3 dataset soon. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_adamo1139__Yi-34b-200K-rawrr-v2-run-0902-LoRA) | Metric |Value| |---------------------------------|----:| |Avg. |69.15| |AI2 Reasoning Challenge (25-Shot)|64.68| |HellaSwag (10-Shot) |84.50| |MMLU (5-Shot) |75.76| |TruthfulQA (0-shot) |46.66| |Winogrande (5-shot) |81.14| |GSM8k (5-shot) |62.17|