--- library_name: transformers datasets: - mlabonne/orpo-dpo-mix-40k - jondurbin/gutenberg-dpo-v0.1 --- # Orpo-GutenLlama-3-8B-v2 ## Training Params + Learning Rate: 8e-6 + Batch Size: 1 + Eval Batch size: 1 + Gradient accumulation steps: 4 + Epochs: 3 + Training Loss: 0.88 Training time: 4 hours on 1x4090. This is a small 1800 sample fine tune to get comfortable with ORPO fine tuning before scaling up. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/q5Okh82tXKgaonwPrT7Gg.png)