Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference
bleysg commited on
Commit
66cd022
1 Parent(s): 9724828

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -34,6 +34,8 @@ https://AlignmentLab.ai
34
 
35
  # Evaluation
36
 
 
 
37
  ![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
38
 
39
  | Metric | Value |
@@ -47,6 +49,19 @@ https://AlignmentLab.ai
47
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
48
 
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  # Model Details
51
 
52
  * **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
@@ -87,7 +102,7 @@ Please see our [paper](https://platypus-llm.github.io/Platypus.pdf) and [project
87
  For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
88
 
89
 
90
- # Reproducing Evaluation Results
91
 
92
  Install LM Evaluation Harness:
93
  ```
 
34
 
35
  # Evaluation
36
 
37
+ ## HuggingFace Leaderboard Performance
38
+
39
  ![HF Leaderboard](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BHFLeaderboard.webp)
40
 
41
  | Metric | Value |
 
49
  We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, using the same version as the HuggingFace LLM Leaderboard. Please see below for detailed instructions on reproducing benchmark results.
50
 
51
 
52
+ ## AGIEval Performance
53
+
54
+ We compare our results to our base Preview2 model, and find **112%** of the base model's performance on AGI Eval, averaging **0.463**.
55
+
56
+ ![OpenOrca-Platypus2-13B AGIEval Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BAGIEval.webp "AGIEval Performance")
57
+
58
+ ## BigBench-Hard Performance
59
+
60
+ We compare our results to our base Preview2 model, and find **105%** of the base model's performance on BigBench-Hard, averaging **0.442**.
61
+
62
+ ![OpenOrca-Platypus2-13B BigBench-Hard Performance](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B/resolve/main/Images/OrcaPlatypus13BBigBenchHard.webp "BigBench-Hard Performance")
63
+
64
+
65
  # Model Details
66
 
67
  * **Trained by**: **Platypus2-13B** trained by Cole Hunter & Ariel Lee; **OpenOrcaxOpenChat-Preview2-13B** trained by Open-Orca
 
102
  For training details and inference instructions please see the [Platypus](https://github.com/arielnlee/Platypus) GitHub repo.
103
 
104
 
105
+ # Reproducing Evaluation Results (for HuggingFace Leaderboard Eval)
106
 
107
  Install LM Evaluation Harness:
108
  ```