Phi-2 Orange

A two-step finetune of Phi-2, with a bit of zest.

There is an updated model at rhysjones/phi-2-orange-v2 which has higher evals, if you wish to test.

Training details

A first finetune using a collection of broad training data:

And then a DPO finetune using:

Run within Ollama

If you're using Ollama, you can download and run using:

ollama run rhysjones/phi-2-orange

Prompt Format

Phi-2 Orange uses ChatML as the prompt format, with or without the system instruction.

To prompt with a system instruction (use whatever system prompt you like):

You are a helpful assistant for Python which outputs in Markdown format.<|im_end|>
Write a function to calculate the Fibonacci sequence<|im_end|>

You can also omit the system prompt if you wish:

Why is the sky blue?<|im_end|>


Evaluations done using mlabonne's usefull Colab notebook llm-autoeval. Also check out the alternative leaderboard at Yet_Another_LLM_Leaderboard

Model AGIEval GPT4All TruthfulQA Bigbench Average
phi-2-orange 33.37 71.33 49.87 37.3 47.97
phi-2-dpo 30.39 71.68 50.75 34.9 46.93
dolphin-2_6-phi-2 33.12 69.85 47.39 37.2 46.89
phi-2 27.98 70.8 44.43 35.21 44.61
