metadata

license: mit

Phi-2 Orange

A two-step finetune of Phi-2.

First using a collection of broad training data:

And then a DPO finetune using:

Evaluations

Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
phi-2-orange	33.29	71.39	49.9	37.14	47.93
phi-2-dpo	30.39	71.68	50.75	34.9	46.93
dolphin-2_6-phi-2	33.12	69.85	47.39	37.2	46.89
phi-2	27.98	70.8	44.43	35.21	44.61