metadata

license: mit
datasets:
  - Open-Orca/SlimOrca-Dedup
  - migtissera/Synthia-v1.3
  - LDJnr/Verified-Camel
  - LDJnr/Pure-Dove
  - LDJnr/Capybara
  - meta-math/MetaMathQA
  - Intel/orca_dpo_pairs
  - argilla/ultrafeedback-binarized-preferences-cleaned

Phi-2 Orange

A two-step finetune of Phi-2, with a bit of zest.

First using a collection of broad training data:

Open-Orca/SlimOrca-Dedup
migtissera/Synthia-v1.3
LDJnr/Verified-Camel
LDJnr/Pure-Dove
LDJnr/Capybara
meta-math/MetaMathQA

And then a DPO finetune using:

Intel/orca_dpo_pairs
argilla/ultrafeedback-binarized-preferences-cleaned

Evaluations

Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
phi-2-orange	33.37	71.33	49.87	37.3	47.97
phi-2-dpo	30.39	71.68	50.75	34.9	46.93
dolphin-2_6-phi-2	33.12	69.85	47.39	37.2	46.89
phi-2	27.98	70.8	44.43	35.21	44.61