second iteration Qlora with DPO on full gsm8k preference dataset version 2.1 and 1 epoch and rank 64, beta 0.3
28fd9e7
verified
valerielucro
commited on