metadata
library_name: transformers
datasets:
- mlabonne/orpo-dpo-mix-40k
- jondurbin/gutenberg-dpo-v0.1
Orpo-GutenLlama-3-8B-v2
Training Params
- Learning Rate: 8e-6
- Batch Size: 1
- Eval Batch size: 1
- Gradient accumulation steps: 4
- Epochs: 3
- Training Loss: 0.88
Training time: 4 hours on 1x4090. This is a small 1800 sample fine tune to get comfortable with ORPO fine tuning before scaling up.