rank 8 adapter trained with DPO on the gsm8k preference dataset with cot and 1 epoch
ad75933
verified
valerielucro
commited on