adapter trained with DPO on the gsm8k preference dataset and 1 epoch f00f5bf verified valerielucro commited on 21 days ago