THIS MODEL IS EXPERIMENTAL AND MIGHT BE BUGGY, I DIDN'T PERFECT THE STRENGTH OF DPO AND SFT YET.
Submitting to Open LLM leaderboard with base model yi-34b-200k-llamafied to see whether there's a point in merging a lora over a lora if both have the same lora_r or if it doesn't matter.
Another AEZAKMI v2 finetune over Yi-34B-200K-rawrr-r3. Sequence length 2200 I was able to squeeze that in using Unsloth, script I used is in this repo. Training took around 18 hours on local RTX 3090 Ti. Will be uploading fp16 and exl2 soon. So far it seems like de-contaminating Yi worked nicely. This lora goes over Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3 lora. So first get Yi-34B-200K llamafied, merge in Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3, then merge in this lora.
Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)