--- license: mit --- This is a test DPO finetune of Microsoft phi-2 Two DPO datasets are used. Training was for 1 epoch as a qlora with rank 64. - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) # Initial Evals - ARC: 63.14 - TruthfulQA: 48.47