adapter trained with DPO on the gsm8k preference dataset and 1 epoch 45537ee verified valerielucro commited on Jun 19