Model description
Yi-34B 200K XLCTX base model fine-tuned on adamo1139/rawrr_v2-2_stage1 (DPO), adamo1139/AEZAKMI_v3-7 (SFT) and adamo1139/toxic-dpo-natural-v5 (ORPO) datasets. Training took around 7 (DPO) + 13 (SFT) + 3 (ORPO) = 23 hours total on RTX 3090 Ti, all finetuning was done locally. This is excluding failed attempts and issues I had with merging script, that basically made me run DPO and SFT stages 2 times over because I thought that my LoRAs were broken, but it turned out to be some bug with new transformers/peft versions.
This model is tuned to use more natural language and also be very uncensored.
Say goodbye to "It's important to remember"!
Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
Cost of this fine-tune is about $5-$10 in electricity.
Base model used for fine-tuning was Yi-34B-200K model shared by 01.ai, the newer version that has improved long context needle in a haystack retrieval. They didn't give it a new name, giving it numbers would mess up AEZAKMI naming scheme by adding a second number, so I will be calling it XLCTX.
You can see examples of responses to various prompts here (loaded with transformers load_in_4bit)
I had to lower max_positional_embeddings in config.json and model_max_length for training to start, otherwise I was OOMing straight away. This attempt had both max_position_embeddings and model_max_length set to 4096, which worked perfectly fine. I then reversed this to 200000 once I was uploading it. I think it should keep long context capabilities of the base model should be present here.
If you want to see training scripts, let me know and I will upload them. LoRAs are uploaded here adamo1139/yi-34b-200k-xlctx-aezakmi-raw-toxic-dpo-sft-orpo-lora-0205
Quants!
EXL2 quant coming soon, I plan to make and upload something around 4.65bpw, it should work nicely with q4 cache in exllama2
Prompt Format
I recommend using ChatML format, as this was used during fine-tune.
Here's a prompt format you should use, you can set a different system message, model was trained on SystemChat dataset, so it should respect system prompts fine.
<|im_start|>system
A chat.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Intended uses & limitations
Use is limited by apache-2.0 license.
Some datasets that were used prohibit commercial use (no_robots with CC-BY-NC-4.0), so I think you should use non-commercially only, unless you know law better and think it doesn't matter.
Known Issues
I haven't found any yet.
Credits
Thanks to unsloth and huggingface team for providing software packages used during fine-tuning.
Thanks to Jon Durbin, abacusai, huggingface, sandex, NobodyExistsOnTheInternet, Nous-Research, lmsys, PygmalionAI for open sourcing datasets I included in the AEZAKMI dataset.
AEZAKMI is basically a mix of open source datasets I found on HF, so without them this would not be possible at all.
- Downloads last month
- 18