|
--- |
|
library_name: transformers |
|
datasets: |
|
- mlabonne/orpo-dpo-mix-40k |
|
- jondurbin/gutenberg-dpo-v0.1 |
|
--- |
|
# Orpo-GutenLlama-3-8B-v2 |
|
|
|
## Training Params |
|
|
|
+ Learning Rate: 8e-6 |
|
+ Batch Size: 1 |
|
+ Eval Batch size: 1 |
|
+ Gradient accumulation steps: 4 |
|
+ Epochs: 3 |
|
+ Training Loss: 0.88 |
|
|
|
Training time: 4 hours on 1x4090. This is a small 1800 sample fine tune to get comfortable with ORPO fine tuning before scaling up. |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/q5Okh82tXKgaonwPrT7Gg.png) |