adapter trained with DPO on the gsm8k preference dataset and 1 epoch f00f5bf verified valerielucro commited on Jun 19