File size: 822 Bytes

---
license: apache-2.0
datasets:
- adamo1139/rawrr_v1
tags:
- dpo
- qlora
- unsloth
---
Another QLoRA DPO training of Yi-34B-200K.
This time with sequence length 500, lora_r 16 and lora alpha 32.
I was able to squeeze that in using Unsloth, script I used is in this repo.
It definitely has much stronger effect than my previous one that was with lora_r 4, lora_alpha 8 and sequence length 200, but I am not sure if I didn't overcook it.
Will try to train this on AEZAKMI v2 now.

Credits for mlabonne (I was using his Mistral fine-tuning script pieces for dataset preparation), Daniel Han and Michael Han (Unsloth AI team)

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)