leaderboard-pr-bot
commited on
Commit
•
e67c396
1
Parent(s):
7caa5fc
Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
@@ -93,4 +93,17 @@ BigBenchHard results:
|
|
93 |
|**Average**| |**0.3754**
|
94 |
|
95 |
# Ethical Considerations and Limitations
|
96 |
-
Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As HyperbeeAI, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|**Average**| |**0.3754**
|
94 |
|
95 |
# Ethical Considerations and Limitations
|
96 |
+
Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As HyperbeeAI, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.
|
97 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
98 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HyperbeeAI__Tulpar-7b-v0)
|
99 |
+
|
100 |
+
| Metric | Value |
|
101 |
+
|-----------------------|---------------------------|
|
102 |
+
| Avg. | 50.84 |
|
103 |
+
| ARC (25-shot) | 56.31 |
|
104 |
+
| HellaSwag (10-shot) | 79.01 |
|
105 |
+
| MMLU (5-shot) | 52.55 |
|
106 |
+
| TruthfulQA (0-shot) | 51.68 |
|
107 |
+
| Winogrande (5-shot) | 73.88 |
|
108 |
+
| GSM8K (5-shot) | 2.73 |
|
109 |
+
| DROP (3-shot) | 39.75 |
|