---
license: mit
---

# Phi-2 Orange

A two-step finetune of Phi-2.

First using a collection of broad training data:

- [Open-Orca/SlimOrca-Dedup](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup)
- [migtissera/Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)
- [LDJnr/Verified-Camel](https://huggingface.co/datasets/LDJnr/Verified-Camel)
- [LDJnr/Pure-Dove](https://huggingface.co/datasets/LDJnr/Pure-Dove)
- [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)

And then a DPO finetune using:

- [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
- [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)

# Evaluations
Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
|                             Model                              |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)|  **33.29**|  71.39|      49.9|   37.14|  **47.93**|
|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)|  30.39|  **71.68**|     **50.75**|    34.9|  46.93|
|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)|  33.12|  69.85|     47.39|    **37.2**|  46.89|
|[phi-2](https://huggingface.co/microsoft/phi-2)|  27.98|   70.8|     44.43|   35.21|  44.61|