TinyLlama-1.1B DPO
Collection
Apply SFT and DPO to TinyLlama 1.1B
•
4 items
•
Updated
•
1
This model is a fine-tuned version of PY007/TinyLlama-1.1B-intermediate-step-715k-1.5T on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.459 | 0.7 | 285 | 1.4638 |