--- license: mit --- # Phi-2 Orange A two-step finetune of Phi-2. First using a collection of broad training data: - [Open-Orca/SlimOrca-Dedup](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup) - [migtissera/Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3) - [LDJnr/Verified-Camel](https://huggingface.co/datasets/LDJnr/Verified-Camel) - [LDJnr/Pure-Dove](https://huggingface.co/datasets/LDJnr/Pure-Dove) - [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara) - [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) And then a DPO finetune using: - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned) # Evaluations Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval). Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |----------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)| **33.29**| 71.39| 49.9| 37.14| **47.93**| |[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| **71.68**| **50.75**| 34.9| 46.93| |[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39| **37.2**| 46.89| |[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|