File size: 7,135 Bytes
7fbb0bb
 
4db7146
 
 
7fbb0bb
 
 
 
 
 
7569a46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cbf999
 
 
 
7fbb0bb
 
cd3951b
7fbb0bb
 
4db7146
 
 
76e5b9c
 
4db7146
 
 
 
 
 
 
 
 
 
 
189413b
 
 
4db7146
 
 
189413b
4db7146
 
 
 
7569a46
1f9a318
bf6bfe3
1f9a318
7569a46
 
 
 
 
 
 
 
 
 
 
4db7146
 
 
 
1a06bf8
7fbb0bb
ce4ba3c
189413b
 
ce4ba3c
7569a46
ce4ba3c
 
 
 
 
 
 
 
 
 
 
 
 
7fbb0bb
 
 
 
 
 
5cbf999
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
library_name: transformers
base_model: google/gemma-2b
tags:
- trl
- orpo
- generated_from_trainer
model-index:
  - name: gemma-2b-orpo
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 49.15
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 73.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 38.52
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.53
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.33
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 13.87
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
datasets:
- alvarobartt/dpo-mix-7k-simplified
language:
- en
---

<img src="./assets/gemma-2b-orpo.png" width="450"></img>
# gemma-2b-orpo

This is an ORPO fine-tune of [google/gemma-2b](https://huggingface.co/google/gemma-2b) with
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).

**โšก Quantized version (GGUF)**: https://huggingface.co/anakin87/gemma-2b-orpo-GGUF

## ORPO
[ORPO (Odds Ratio Preference Optimization)](https://arxiv.org/abs/2403.07691) is a new training paradigm that combines the usually separated phases
of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).
- Faster training
- Less memory usage (no reference model needed)
- Good results!

## ๐Ÿ† Evaluation

### Nous

gemma-2b-orpo performs well for its size on Nous' benchmark suite.

(evaluation conducted using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval)).

| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|---|---:|---:|---:|---:|---:|
| [**anakin87/gemma-2b-orpo**](https://huggingface.co/anakin87/gemma-2b-orpo) [๐Ÿ“„](./assets/gemma-2b-orpo-Nous.md) | **39.45** | 23.76 | 58.25 | 44.47 | 31.32 |
| [mlabonne/Gemmalpaca-2B](https://huggingface.co/mlabonne/Gemmalpaca-2B) [๐Ÿ“„](https://gist.github.com/mlabonne/4b638752fc3227df566f9562064cb864) | 38.39 | 24.48 | 51.22 | 47.02 | 30.85 |
| [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) [๐Ÿ“„](https://gist.github.com/mlabonne/db0761e74175573292acf497da9e5d95) | 36.1 | 23.76 | 43.6 | 47.64 | 29.41 |
| [google/gemma-2b](https://huggingface.co/google/gemma-2b) [๐Ÿ“„](https://gist.github.com/mlabonne/7df1f238c515a5f63a750c8792cef59e) | 34.26 | 22.7 | 43.35 | 39.96 | 31.03 |

### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_anakin87__gemma-2b-orpo).

By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |47.35|
|AI2 Reasoning Challenge (25-Shot)|49.15|
|HellaSwag (10-Shot)              |73.72|
|MMLU (5-Shot)                    |38.52|
|TruthfulQA (0-shot)              |44.53|
|Winogrande (5-shot)              |64.33|
|GSM8k (5-shot)                   |13.87|


## ๐Ÿ™ Dataset
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified)
is a simplified version of [`argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k).
You can find more information [in the dataset card](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).

## ๐ŸŽฎ Model in action
### Usage notebook
[๐Ÿ““ Chat and RAG using Haystack](./notebooks/usage.ipynb)
### Simple text generation with Transformers
The model is small, so it runs smoothly on Colab. *It is also fine to load the model using quantization*.
```python
# pip install transformers accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="anakin87/gemma-2b-orpo", torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content": "Write a rap song on Vim vs VSCode."}]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.7,  top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
## Training
The model was trained using HF TRL.
[๐Ÿ““ Training notebook](./notebooks/training.ipynb)

### Framework versions

- Transformers 4.39.1
- Pytorch 2.2.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2