SultanR
/

SmolTulu-1.7b-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SultanR commited on 2 days ago

Commit

038ba8a

•

1 Parent(s): e29498c

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -140,6 +140,21 @@ I ran these evaluations using [SmolLM2's evaluation code](https://github.com/hug
 | MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
 | PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
 ## Usage
 Just like any Huggingface model, just run it using the transformers library:

 | MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | **24.2** | 11.7 |
 | PIQA | 72.2 | 72.1 | **74.4** | 72.3 | 73.2 | 71.6 |
+## Training Details
+The model was trained using Direct Preference Optimization (DPO) with the following configuration:
+- Base model: SmolLM2-1.7B with AllenAI's SFT pipeline ran
+- Mixed precision: bfloat16
+- Learning rate: 8e-7 with linear scheduler
+- Warmup ratio: 0.1
+- Training epochs: 1
+- Effective batch size: 12
+- Sequence length: 4096 tokens
+- DPO loss: Length-normalized DPO
+- DPO beta: 5.0
+- Gradient checkpointing enabled
+- DeepSpeed Stage 3 for memory optimization
 ## Usage
 Just like any Huggingface model, just run it using the transformers library: