HyperbeeAI
/

Tulpar-7b-v0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Adding Evaluation Results

#2

by leaderboard-pr-bot - opened Nov 17, 2023

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -93,4 +93,17 @@ BigBenchHard results:
 |**Average**| |**0.3754**
 # Ethical Considerations and Limitations
-Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As HyperbeeAI, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.

 |**Average**| |**0.3754**
 # Ethical Considerations and Limitations
+Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As HyperbeeAI, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HyperbeeAI__Tulpar-7b-v0)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 50.84   |
+| ARC (25-shot)         | 56.31          |
+| HellaSwag (10-shot)   | 79.01    |
+| MMLU (5-shot)         | 52.55         |
+| TruthfulQA (0-shot)   | 51.68   |
+| Winogrande (5-shot)   | 73.88   |
+| GSM8K (5-shot)        | 2.73        |
+| DROP (3-shot)         | 39.75         |