anthracite-org
/

magnum-v1-72b

Text Generation

Transformers

Safetensors

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lucyknada

leaderboard-pr-bot commited on about 12 hours ago

Commit

c4a7165

•

1 Parent(s): f8f8502

Adding Evaluation Results (#12)

Browse files

- Adding Evaluation Results (e25e9ea0a20f73d95ebc79af910be83924ace781)

Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -3,9 +3,9 @@ language:
 - en
 - zh
 license: other
-base_model: Qwen/Qwen2-72B-Instruct
 tags:
 - chat
 license_name: tongyi-qianwen
 license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
 pipeline_tag: text-generation
@@ -21,6 +21,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: inst_level_strict_acc and prompt_level_strict_acc
       value: 76.06
       name: strict accuracy
@@ -36,6 +39,9 @@ model-index:
       args:
         num_few_shot: 3
     metrics:
     - type: acc_norm
       value: 57.65
       name: normalized accuracy
@@ -51,6 +57,9 @@ model-index:
       args:
         num_few_shot: 4
     metrics:
     - type: exact_match
       value: 35.27
       name: exact match
@@ -66,6 +75,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: acc_norm
       value: 18.79
       name: acc_norm
@@ -81,6 +93,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: acc_norm
       value: 15.62
       name: acc_norm
@@ -101,6 +116,9 @@ model-index:
     - type: acc
       value: 49.64
       name: accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=alpindale/magnum-72b-v1
       name: Open LLM Leaderboard
@@ -152,3 +170,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
 |MuSR (0-shot)      |15.62|
 |MMLU-PRO (5-shot)  |49.64|

 - en
 - zh
 license: other
 tags:
 - chat
+base_model: Qwen/Qwen2-72B-Instruct
 license_name: tongyi-qianwen
 license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
 pipeline_tag: text-generation
       args:
         num_few_shot: 0
     metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 76.06
+      name: strict accuracy
     - type: inst_level_strict_acc and prompt_level_strict_acc
       value: 76.06
       name: strict accuracy
       args:
         num_few_shot: 3
     metrics:
+    - type: acc_norm
+      value: 57.65
+      name: normalized accuracy
     - type: acc_norm
       value: 57.65
       name: normalized accuracy
       args:
         num_few_shot: 4
     metrics:
+    - type: exact_match
+      value: 35.27
+      name: exact match
     - type: exact_match
       value: 35.27
       name: exact match
       args:
         num_few_shot: 0
     metrics:
+    - type: acc_norm
+      value: 18.79
+      name: acc_norm
     - type: acc_norm
       value: 18.79
       name: acc_norm
       args:
         num_few_shot: 0
     metrics:
+    - type: acc_norm
+      value: 15.62
+      name: acc_norm
     - type: acc_norm
       value: 15.62
       name: acc_norm
     - type: acc
       value: 49.64
       name: accuracy
+    - type: acc
+      value: 49.85
+      name: accuracy
     source:
       url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=alpindale/magnum-72b-v1
       name: Open LLM Leaderboard
 |MuSR (0-shot)      |15.62|
 |MMLU-PRO (5-shot)  |49.64|
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_anthracite-org__magnum-v1-72b)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |42.21|
+|IFEval (0-Shot)    |76.06|
+|BBH (3-Shot)       |57.65|
+|MATH Lvl 5 (4-Shot)|35.27|
+|GPQA (0-shot)      |18.79|
+|MuSR (0-shot)      |15.62|
+|MMLU-PRO (5-shot)  |49.85|