Update README.md
Browse files
README.md
CHANGED
@@ -140,6 +140,21 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
|
|
140 |
| MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
|
141 |
| PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
|
142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
## Usage
|
144 |
|
145 |
Just like any Huggingface model, just run it using the transformers library:
|
|
|
140 |
| MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
|
141 |
| PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
|
142 |
|
143 |
+
## Training Details
|
144 |
+
|
145 |
+
The model was trained using Direct Preference Optimization (DPO) with the following configuration:
|
146 |
+
- Base model: SmolLM2-1.7B with AllenAI's SFT pipeline ran
|
147 |
+
- Mixed precision: bfloat16
|
148 |
+
- Learning rate: 8e-7 with linear scheduler
|
149 |
+
- Warmup ratio: 0.1
|
150 |
+
- Training epochs: 1
|
151 |
+
- Effective batch size: 12
|
152 |
+
- Sequence length: 4096 tokens
|
153 |
+
- DPO loss: Length-normalized DPO
|
154 |
+
- DPO beta: 5.0
|
155 |
+
- Gradient checkpointing enabled
|
156 |
+
- DeepSpeed Stage 3 for memory optimization
|
157 |
+
|
158 |
## Usage
|
159 |
|
160 |
Just like any Huggingface model, just run it using the transformers library:
|