rhysjones
/

phi-2-orange

Text Generation

Inference Endpoints

Model card Files Files and versions Community

phi-2-orange / README.md

rhysjones's picture

Update README.md

fb8b52a 6 months ago

|

raw history blame

No virus

1.79 kB

	---
	license: mit
	---

	# Phi-2 Orange

	A two-step finetune of Phi-2.

	First using a collection of broad training data:

	- [Open-Orca/SlimOrca-Dedup](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup)
	- [migtissera/Synthia-v1.3](https://huggingface.co/datasets/migtissera/Synthia-v1.3)
	- [LDJnr/Verified-Camel](https://huggingface.co/datasets/LDJnr/Verified-Camel)
	- [LDJnr/Pure-Dove](https://huggingface.co/datasets/LDJnr/Pure-Dove)
	- [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
	- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)

	And then a DPO finetune using:

	- [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
	- [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)

	# Evaluations
	Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
	Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|----------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)\| 33.29\| 71.39\| 49.9\| 37.14\| 47.93\|
	\|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)\| 30.39\| 71.68\| 50.75\| 34.9\| 46.93\|
	\|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)\| 33.12\| 69.85\| 47.39\| 37.2\| 46.89\|
	\|[phi-2](https://huggingface.co/microsoft/phi-2)\| 27.98\| 70.8\| 44.43\| 35.21\| 44.61\|