Update README.md
Browse files
README.md
CHANGED
@@ -34,6 +34,8 @@ https://AlignmentLab.ai
|
|
34 |
|
35 |
# Evaluation
|
36 |
|
|
|
|
|
37 |
![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
|
38 |
|
39 |
| Metric | Value |
|
@@ -47,6 +49,19 @@ https://AlignmentLab.ai
|
|
47 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
|
48 |
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
# Model Details
|
51 |
|
52 |
* **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
|
@@ -87,7 +102,7 @@ Please see our [paper](https://platypus-llm.github.io/Platypus.pdf) and [project
|
|
87 |
For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
|
88 |
|
89 |
|
90 |
-
# Reproducing Evaluation Results
|
91 |
|
92 |
Install LM Evaluation Harness:
|
93 |
```
|
|
|
34 |
|
35 |
# Evaluation
|
36 |
|
37 |
+
## HuggingFace Leaderboard Performance
|
38 |
+
|
39 |
![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
|
40 |
|
41 |
| Metric | Value |
|
|
|
49 |
We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
|
50 |
|
51 |
|
52 |
+
## AGIEval Performance
|
53 |
+
|
54 |
+
We compare our results to our base Preview2 model, and find **112%** of the base model's performance on AGI Eval, averaging **0.463**.
|
55 |
+
|
56 |
+
![OpenOrca-Platypus2-13B AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BAGIEval.webp "AGIEval Performance")
|
57 |
+
|
58 |
+
## BigBench-Hard Performance
|
59 |
+
|
60 |
+
We compare our results to our base Preview2 model, and find **105%** of the base model's performance on BigBench-Hard, averaging **0.442**.
|
61 |
+
|
62 |
+
![OpenOrca-Platypus2-13B BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BBigBenchHard.webp "BigBench-Hard Performance")
|
63 |
+
|
64 |
+
|
65 |
# Model Details
|
66 |
|
67 |
* **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
|
|
|
102 |
For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
|
103 |
|
104 |
|
105 |
+
# Reproducing Evaluation Results (for HuggingFace Leaderboard Eval)
|
106 |
|
107 |
Install LM Evaluation Harness:
|
108 |
```
|