rhysjones
/

phi-2-orange

Text Generation

Inference Endpoints

Model card Files Files and versions Community

rhysjones commited on Jan 10

Commit

fb8b52a

•

1 Parent(s): b182c14

Update README.md

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -20,7 +20,12 @@ And then a DPO finetune using:
 - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
 - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
-# Initial Evals
-- ARC: 62.29
-- TruthfulQA: 49.85

 - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
 - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
+# Evaluations
+Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
+Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
+|                             Model                              |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
+|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
+|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)|  **33.29**|  71.39|      49.9|   37.14|  **47.93**|
+|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)|  30.39|  **71.68**|     **50.75**|    34.9|  46.93|
+|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)|  33.12|  69.85|     47.39|    **37.2**|  46.89|
+|[phi-2](https://huggingface.co/microsoft/phi-2)|  27.98|   70.8|     44.43|   35.21|  44.61|