SultanR commited on
Commit
038ba8a
1 Parent(s): e29498c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -140,6 +140,21 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
140
  | MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
141
  | PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
142
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  ## Usage
144
 
145
  Just like any Huggingface model, just run it using the transformers library:
 
140
  | MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
141
  | PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
142
 
143
+ ## Training Details
144
+
145
+ The model was trained using Direct Preference Optimization (DPO) with the following configuration:
146
+ - Base model: SmolLM2-1.7B with AllenAI's SFT pipeline ran
147
+ - Mixed precision: bfloat16
148
+ - Learning rate: 8e-7 with linear scheduler
149
+ - Warmup ratio: 0.1
150
+ - Training epochs: 1
151
+ - Effective batch size: 12
152
+ - Sequence length: 4096 tokens
153
+ - DPO loss: Length-normalized DPO
154
+ - DPO beta: 5.0
155
+ - Gradient checkpointing enabled
156
+ - DeepSpeed Stage 3 for memory optimization
157
+
158
  ## Usage
159
 
160
  Just like any Huggingface model, just run it using the transformers library: