adapter trained with DPO on the gsm8k preference dataset with cot and 1 epoch 9c07365 verified valerielucro commited on Jun 26
adapter trained with DPO on the gsm8k preference dataset with cot and 1 epoch cfeb3fa verified valerielucro commited on Jun 26