Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +178 -10

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: other
 language:
 - en
 - fr
@@ -7,22 +6,177 @@ language:
 - hi
 - zh
 - code
-base_model: microsoft/Orca-2-13b
 datasets:
 - HuggingFaceH4/no_robots
 - mlabonne/guanaco-llama2-1k
 - OpenAssistant/oasst_top1_2023-08-25
 - totally-not-an-llm/EverythingLM-data-V3
 - garage-bAInd/Open-Platypus
 widget:
-  - example_title: "folklore"
-    text: "<|im_start|>user\nAre the stories of Nahanni Valley, like the decapitated men, true?\n<|im_end|>\n<|im_start|>assistant\n"
-    output:
-      text: "The Nahanni Valley in Canada is a remote and rugged region that has long been shrouded in mystery. It is home to the South Nahanni River, which was designated as a UNESCO World Heritage Site due to its exceptional natural beauty and biodiversity. The valley's history includes tales of gold prospectors, fur traders, and indigenous populations, but it also features some unexplained events and legends. One such story involves two prospectors who allegedly encountered an unknown creature that led them into the forest before returning without their heads. This tale has been passed down through generations, but there are no definitive answers regarding its veracity.\n\nIt is important to note that these stories may have originated from local folklore or were exaggerated over time. There is no concrete evidence to support the existence of supernatural creatures or paranormal occurrences within the Nahanni Valley. Nonetheless, the area remains a captivating destination for those seeking adventure and exploration. If you have any questions about this topic or need assistance related to traveling to the Nahanni Valley, please feel free to ask!"
-  - example_title: "Alan Turing"
-    text: "<|im_start|>user\nWhy was Alan Turing castrated?\n<|im_end|>\n<|im_start|>assistant\n"
-    output:
-      text: "Alan Turing, a renowned mathematician and computer scientist who played a crucial role in breaking the German Enigma code during World War II, faced significant challenges due to his homosexuality. In 1952, he was convicted of \"gross indecency\" for engaging in a consensual relationship with another man, which led to him being sentenced to hormone therapy designed to reduce his sex drive. This treatment included the administration of estrogen, which had the side effect of causing breast growth.\n\nIn an effort to avoid this unwanted physical change, Turing requested that his doctors cease the hormone treatments. However, they ignored his request and continued the therapy without his consent. As a result, Turing's breasts began to grow, leading him to undergo surgery to remove them. The operation, known as a mastectomy, caused considerable pain and discomfort, further contributing to his emotional turmoil.\n\nIt is important to note that Turing's castration was not directly carried out by the authorities but rather resulted from the unintended consequences of the forced hormone therapy."
 ---
 The "microsoft/Orca-2-13b" model fully fine-tuned on HuggingFaceH4/no_robots, totally-not-an-llm/EverythingLM-data-V3, mlabonne/guanaco-llama2-1k, OpenAssistant/oasst_top1_2023-08-25, and garage-bAInd/Open-Platypus. This model achieved a test loss of 0.38 on garage-bAInd/Open-Platypus.
@@ -39,3 +193,17 @@ top_p = 0.14
 top_k = 41
 repetition_penalty = 1.176

 ---
 language:
 - en
 - fr
 - hi
 - zh
 - code
+license: other
 datasets:
 - HuggingFaceH4/no_robots
 - mlabonne/guanaco-llama2-1k
 - OpenAssistant/oasst_top1_2023-08-25
 - totally-not-an-llm/EverythingLM-data-V3
 - garage-bAInd/Open-Platypus
+base_model: microsoft/Orca-2-13b
 widget:
+- example_title: folklore
+  text: '<|im_start|>user
+    Are the stories of Nahanni Valley, like the decapitated men, true?
+    <|im_end|>
+    <|im_start|>assistant
+    '
+  output:
+    text: 'The Nahanni Valley in Canada is a remote and rugged region that has long
+      been shrouded in mystery. It is home to the South Nahanni River, which was designated
+      as a UNESCO World Heritage Site due to its exceptional natural beauty and biodiversity.
+      The valley''s history includes tales of gold prospectors, fur traders, and indigenous
+      populations, but it also features some unexplained events and legends. One such
+      story involves two prospectors who allegedly encountered an unknown creature
+      that led them into the forest before returning without their heads. This tale
+      has been passed down through generations, but there are no definitive answers
+      regarding its veracity.
+      It is important to note that these stories may have originated from local folklore
+      or were exaggerated over time. There is no concrete evidence to support the
+      existence of supernatural creatures or paranormal occurrences within the Nahanni
+      Valley. Nonetheless, the area remains a captivating destination for those seeking
+      adventure and exploration. If you have any questions about this topic or need
+      assistance related to traveling to the Nahanni Valley, please feel free to ask!'
+- example_title: Alan Turing
+  text: '<|im_start|>user
+    Why was Alan Turing castrated?
+    <|im_end|>
+    <|im_start|>assistant
+    '
+  output:
+    text: 'Alan Turing, a renowned mathematician and computer scientist who played
+      a crucial role in breaking the German Enigma code during World War II, faced
+      significant challenges due to his homosexuality. In 1952, he was convicted of
+      "gross indecency" for engaging in a consensual relationship with another man,
+      which led to him being sentenced to hormone therapy designed to reduce his sex
+      drive. This treatment included the administration of estrogen, which had the
+      side effect of causing breast growth.
+      In an effort to avoid this unwanted physical change, Turing requested that his
+      doctors cease the hormone treatments. However, they ignored his request and
+      continued the therapy without his consent. As a result, Turing''s breasts began
+      to grow, leading him to undergo surgery to remove them. The operation, known
+      as a mastectomy, caused considerable pain and discomfort, further contributing
+      to his emotional turmoil.
+      It is important to note that Turing''s castration was not directly carried out
+      by the authorities but rather resulted from the unintended consequences of the
+      forced hormone therapy.'
+model-index:
+- name: Orca-2-13b-SFT_v5
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 59.22
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 80.09
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 60.19
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 51.84
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 80.9
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 8.42
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Locutusque/Orca-2-13b-SFT_v5
+      name: Open LLM Leaderboard
 ---
 The "microsoft/Orca-2-13b" model fully fine-tuned on HuggingFaceH4/no_robots, totally-not-an-llm/EverythingLM-data-V3, mlabonne/guanaco-llama2-1k, OpenAssistant/oasst_top1_2023-08-25, and garage-bAInd/Open-Platypus. This model achieved a test loss of 0.38 on garage-bAInd/Open-Platypus.
 top_k = 41
 repetition_penalty = 1.176
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__Orca-2-13b-SFT_v5)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |56.77|
+|AI2 Reasoning Challenge (25-Shot)|59.22|
+|HellaSwag (10-Shot)              |80.09|
+|MMLU (5-Shot)                    |60.19|
+|TruthfulQA (0-shot)              |51.84|
+|Winogrande (5-shot)              |80.90|
+|GSM8k (5-shot)                   | 8.42|