anakin87 commited on
Commit
7569a46
β€’
1 Parent(s): 76e5b9c

add evaluation on Open LLM Leaderboard

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -9,8 +9,114 @@ tags:
9
  - orpo
10
  - generated_from_trainer
11
  model-index:
12
- - name: gemma-2b-orpo
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  datasets:
15
  - alvarobartt/dpo-mix-7k-simplified
16
  language:
@@ -47,6 +153,20 @@ gemma-2b-orpo performs well for its size on Nous' benchmark suite.
47
  | [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) [πŸ“„](https://gist.github.com/mlabonne/db0761e74175573292acf497da9e5d95) | 36.1 | 23.76 | 43.6 | 47.64 | 29.41 |
48
  | [google/gemma-2b](https://huggingface.co/google/gemma-2b) [πŸ“„](https://gist.github.com/mlabonne/7df1f238c515a5f63a750c8792cef59e) | 34.26 | 22.7 | 43.35 | 39.96 | 31.03 |
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ## πŸ™ Dataset
52
  [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified)
@@ -57,7 +177,7 @@ You can find more information [in the dataset card](https://huggingface.co/datas
57
  ### Usage notebook
58
  [πŸ““ Chat and RAG using Haystack](./notebooks/usage.ipynb)
59
  ### Simple text generation with Transformers
60
- The model is small, so runs smoothly on Colab. *It is also fine to load the model using quantization*.
61
  ```python
62
  # pip install transformers accelerate
63
  import torch
 
9
  - orpo
10
  - generated_from_trainer
11
  model-index:
12
+ - name: gemma-2b-orpo
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ name: Text Generation
17
+ dataset:
18
+ name: AI2 Reasoning Challenge (25-Shot)
19
+ type: ai2_arc
20
+ config: ARC-Challenge
21
+ split: test
22
+ args:
23
+ num_few_shot: 25
24
+ metrics:
25
+ - type: acc_norm
26
+ value: 49.15
27
+ name: normalized accuracy
28
+ source:
29
+ url: >-
30
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
31
+ name: Open LLM Leaderboard
32
+ - task:
33
+ type: text-generation
34
+ name: Text Generation
35
+ dataset:
36
+ name: HellaSwag (10-Shot)
37
+ type: hellaswag
38
+ split: validation
39
+ args:
40
+ num_few_shot: 10
41
+ metrics:
42
+ - type: acc_norm
43
+ value: 73.72
44
+ name: normalized accuracy
45
+ source:
46
+ url: >-
47
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
48
+ name: Open LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: MMLU (5-Shot)
54
+ type: cais/mmlu
55
+ config: all
56
+ split: test
57
+ args:
58
+ num_few_shot: 5
59
+ metrics:
60
+ - type: acc
61
+ value: 38.52
62
+ name: accuracy
63
+ source:
64
+ url: >-
65
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: TruthfulQA (0-shot)
72
+ type: truthful_qa
73
+ config: multiple_choice
74
+ split: validation
75
+ args:
76
+ num_few_shot: 0
77
+ metrics:
78
+ - type: mc2
79
+ value: 44.53
80
+ source:
81
+ url: >-
82
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
83
+ name: Open LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Winogrande (5-shot)
89
+ type: winogrande
90
+ config: winogrande_xl
91
+ split: validation
92
+ args:
93
+ num_few_shot: 5
94
+ metrics:
95
+ - type: acc
96
+ value: 64.33
97
+ name: accuracy
98
+ source:
99
+ url: >-
100
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GSM8k (5-shot)
107
+ type: gsm8k
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 13.87
115
+ name: accuracy
116
+ source:
117
+ url: >-
118
+ https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
119
+ name: Open LLM Leaderboard
120
  datasets:
121
  - alvarobartt/dpo-mix-7k-simplified
122
  language:
 
153
  | [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) [πŸ“„](https://gist.github.com/mlabonne/db0761e74175573292acf497da9e5d95) | 36.1 | 23.76 | 43.6 | 47.64 | 29.41 |
154
  | [google/gemma-2b](https://huggingface.co/google/gemma-2b) [πŸ“„](https://gist.github.com/mlabonne/7df1f238c515a5f63a750c8792cef59e) | 34.26 | 22.7 | 43.35 | 39.96 | 31.03 |
155
 
156
+ ### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
157
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_anakin87__gemma-2b-orpo)
158
+
159
+ | Metric |Value|
160
+ |---------------------------------|----:|
161
+ |Avg. |47.35|
162
+ |AI2 Reasoning Challenge (25-Shot)|49.15|
163
+ |HellaSwag (10-Shot) |73.72|
164
+ |MMLU (5-Shot) |38.52|
165
+ |TruthfulQA (0-shot) |44.53|
166
+ |Winogrande (5-shot) |64.33|
167
+ |GSM8k (5-shot) |13.87|
168
+
169
+ By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.
170
 
171
  ## πŸ™ Dataset
172
  [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified)
 
177
  ### Usage notebook
178
  [πŸ““ Chat and RAG using Haystack](./notebooks/usage.ipynb)
179
  ### Simple text generation with Transformers
180
+ The model is small, so it runs smoothly on Colab. *It is also fine to load the model using quantization*.
181
  ```python
182
  # pip install transformers accelerate
183
  import torch