second iteration Qlora with DPO on full gsm8k preference dataset version 2.1 and 1 epoch and rank 64, beta 0.3
a6e46a4
verified
valerielucro
commited on