Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +124 -22

README.md CHANGED Viewed

@@ -1,9 +1,13 @@
 ---
 license: apache-2.0
-base_model: BEE-spoke-data/smol_llama-220M-GQA
 tags:
 - edu
 - continual pretraining
 metrics:
 - accuracy
 inference:
@@ -20,43 +24,128 @@ widget:
   example_title: El Microondas
 - text: Kennesaw State University is a public
   example_title: Kennesaw State University
-- text: >-
-    Bungie Studios is an American video game developer. They are most famous for
-    developing the award winning Halo series of video games. They also made
-    Destiny. The studio was founded
   example_title: Bungie
 - text: The Mona Lisa is a world-renowned painting created by
   example_title: Mona Lisa
-- text: >-
-    The Harry Potter series, written by J.K. Rowling, begins with the book
-    titled
   example_title: Harry Potter Series
-- text: >-
-    Question: I have cities, but no houses. I have mountains, but no trees. I
     have water, but no fish. What am I?
-    Answer:
   example_title: Riddle
 - text: The process of photosynthesis involves the conversion of
   example_title: Photosynthesis
-- text: >-
-    Jane went to the store to buy some groceries. She picked up apples, oranges,
     and a loaf of bread. When she got home, she realized she forgot
   example_title: Story Continuation
-- text: >-
-    Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
-    another train leaves Station B at 10:00 AM and travels at 80 mph, when will
     they meet if the distance between the stations is 300 miles?
-    To determine
   example_title: Math Problem
 - text: In the context of computer programming, an algorithm is
   example_title: Algorithm Definition
 pipeline_tag: text-generation
-datasets:
-- HuggingFaceFW/fineweb-edu
-language:
-- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -167,4 +256,17 @@ The following hyperparameters were used during training:
 - Transformers 4.41.1
 - Pytorch 2.3.1+cu118
 - Datasets 2.19.1
-- Tokenizers 0.19.1

 ---
+language:
+- en
 license: apache-2.0
 tags:
 - edu
 - continual pretraining
+base_model: BEE-spoke-data/smol_llama-220M-GQA
+datasets:
+- HuggingFaceFW/fineweb-edu
 metrics:
 - accuracy
 inference:
   example_title: El Microondas
 - text: Kennesaw State University is a public
   example_title: Kennesaw State University
+- text: Bungie Studios is an American video game developer. They are most famous for
+    developing the award winning Halo series of video games. They also made Destiny.
+    The studio was founded
   example_title: Bungie
 - text: The Mona Lisa is a world-renowned painting created by
   example_title: Mona Lisa
+- text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
   example_title: Harry Potter Series
+- text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
     have water, but no fish. What am I?
+    Answer:'
   example_title: Riddle
 - text: The process of photosynthesis involves the conversion of
   example_title: Photosynthesis
+- text: Jane went to the store to buy some groceries. She picked up apples, oranges,
     and a loaf of bread. When she got home, she realized she forgot
   example_title: Story Continuation
+- text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
+    and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
     they meet if the distance between the stations is 300 miles?
+    To determine'
   example_title: Math Problem
 - text: In the context of computer programming, an algorithm is
   example_title: Algorithm Definition
 pipeline_tag: text-generation
+model-index:
+- name: smol_llama-220M-GQA-fineweb_edu
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: IFEval (0-Shot)
+      type: HuggingFaceH4/ifeval
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 19.88
+      name: strict accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: BBH (3-Shot)
+      type: BBH
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc_norm
+      value: 2.31
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MATH Lvl 5 (4-Shot)
+      type: hendrycks/competition_math
+      args:
+        num_few_shot: 4
+    metrics:
+    - type: exact_match
+      value: 0.0
+      name: exact match
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GPQA (0-shot)
+      type: Idavidrein/gpqa
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 1.23
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MuSR (0-shot)
+      type: TAUR-Lab/MuSR
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: acc_norm
+      value: 14.26
+      name: acc_norm
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU-PRO (5-shot)
+      type: TIGER-Lab/MMLU-Pro
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 1.41
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
+      name: Open LLM Leaderboard
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 - Transformers 4.41.1
 - Pytorch 2.3.1+cu118
 - Datasets 2.19.1
+- Tokenizers 0.19.1
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-GQA-fineweb_edu)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               | 6.52|
+|IFEval (0-Shot)    |19.88|
+|BBH (3-Shot)       | 2.31|
+|MATH Lvl 5 (4-Shot)| 0.00|
+|GPQA (0-shot)      | 1.23|
+|MuSR (0-shot)      |14.26|
+|MMLU-PRO (5-shot)  | 1.41|