anakin87
/

gemma-2b-orpo

@@ -9,8 +9,114 @@ tags:
 - orpo
 - generated_from_trainer
 model-index:
-- name: gemma-2b-orpo
-  results: []
 datasets:
 - alvarobartt/dpo-mix-7k-simplified
 language:
@@ -47,6 +153,20 @@ gemma-2b-orpo performs well for its size on Nous' benchmark suite.
 | [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) [📄](https://gist.github.com/mlabonne/db0761e74175573292acf497da9e5d95) | 36.1 | 23.76 | 43.6 | 47.64 | 29.41 |
 | [google/gemma-2b](https://huggingface.co/google/gemma-2b) [📄](https://gist.github.com/mlabonne/7df1f238c515a5f63a750c8792cef59e) | 34.26 | 22.7 | 43.35 | 39.96 | 31.03 |
 ## 🙏 Dataset
 [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified)
@@ -57,7 +177,7 @@ You can find more information [in the dataset card](https://huggingface.co/datas
 ### Usage notebook
 [📓 Chat and RAG using Haystack](./notebooks/usage.ipynb)
 ### Simple text generation with Transformers
-The model is small, so runs smoothly on Colab. *It is also fine to load the model using quantization*.
 ```python
 # pip install transformers accelerate
 import torch

 - orpo
 - generated_from_trainer
 model-index:
+  - name: gemma-2b-orpo
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: AI2 Reasoning Challenge (25-Shot)
+          type: ai2_arc
+          config: ARC-Challenge
+          split: test
+          args:
+            num_few_shot: 25
+        metrics:
+          - type: acc_norm
+            value: 49.15
+            name: normalized accuracy
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: HellaSwag (10-Shot)
+          type: hellaswag
+          split: validation
+          args:
+            num_few_shot: 10
+        metrics:
+          - type: acc_norm
+            value: 73.72
+            name: normalized accuracy
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: MMLU (5-Shot)
+          type: cais/mmlu
+          config: all
+          split: test
+          args:
+            num_few_shot: 5
+        metrics:
+          - type: acc
+            value: 38.52
+            name: accuracy
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: TruthfulQA (0-shot)
+          type: truthful_qa
+          config: multiple_choice
+          split: validation
+          args:
+            num_few_shot: 0
+        metrics:
+          - type: mc2
+            value: 44.53
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: Winogrande (5-shot)
+          type: winogrande
+          config: winogrande_xl
+          split: validation
+          args:
+            num_few_shot: 5
+        metrics:
+          - type: acc
+            value: 64.33
+            name: accuracy
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: GSM8k (5-shot)
+          type: gsm8k
+          config: main
+          split: test
+          args:
+            num_few_shot: 5
+        metrics:
+          - type: acc
+            value: 13.87
+            name: accuracy
+        source:
+          url: >-
+            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
+          name: Open LLM Leaderboard
 datasets:
 - alvarobartt/dpo-mix-7k-simplified
 language:
 | [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) [📄](https://gist.github.com/mlabonne/db0761e74175573292acf497da9e5d95) | 36.1 | 23.76 | 43.6 | 47.64 | 29.41 |
 | [google/gemma-2b](https://huggingface.co/google/gemma-2b) [📄](https://gist.github.com/mlabonne/7df1f238c515a5f63a750c8792cef59e) | 34.26 | 22.7 | 43.35 | 39.96 | 31.03 |
+### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_anakin87__gemma-2b-orpo)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |47.35|
+|AI2 Reasoning Challenge (25-Shot)|49.15|
+|HellaSwag (10-Shot)              |73.72|
+|MMLU (5-Shot)                    |38.52|
+|TruthfulQA (0-shot)              |44.53|
+|Winogrande (5-shot)              |64.33|
+|GSM8k (5-shot)                   |13.87|
+By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.
 ## 🙏 Dataset
 [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified)
 ### Usage notebook
 [📓 Chat and RAG using Haystack](./notebooks/usage.ipynb)
 ### Simple text generation with Transformers
+The model is small, so it runs smoothly on Colab. *It is also fine to load the model using quantization*.
 ```python
 # pip install transformers accelerate
 import torch