leaderboard-pr-bot commited on
Commit
a1315d5
1 Parent(s): 90c02cc

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -137,3 +137,17 @@ Current evals out of the Metharme-13b model: <br>
137
  The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.
138
 
139
  As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.
138
 
139
  As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading.
140
+
141
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
142
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TehVenom__Metharme-13b-Merged)
143
+
144
+ | Metric | Value |
145
+ |-----------------------|---------------------------|
146
+ | Avg. | 49.33 |
147
+ | ARC (25-shot) | 59.9 |
148
+ | HellaSwag (10-shot) | 81.12 |
149
+ | MMLU (5-shot) | 47.18 |
150
+ | TruthfulQA (0-shot) | 51.18 |
151
+ | Winogrande (5-shot) | 76.8 |
152
+ | GSM8K (5-shot) | 8.72 |
153
+ | DROP (3-shot) | 20.4 |