rhysjones commited on
Commit
fb8b52a
1 Parent(s): b182c14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -4
README.md CHANGED
@@ -20,7 +20,12 @@ And then a DPO finetune using:
20
  - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
21
  - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
22
 
23
- # Initial Evals
24
-
25
- - ARC: 62.29
26
- - TruthfulQA: 49.85
 
 
 
 
 
 
20
  - [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)
21
  - [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
22
 
23
+ # Evaluations
24
+ Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://github.com/mlabonne/llm-autoeval).
25
+ Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
26
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
27
+ |----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
28
+ |[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)| **33.29**| 71.39| 49.9| 37.14| **47.93**|
29
+ |[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| **71.68**| **50.75**| 34.9| 46.93|
30
+ |[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39| **37.2**| 46.89|
31
+ |[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|