May I ask how this differs from the 10ep version?
#2
by
bdsqlsz
- opened
Thank you for your great work.
SPO works very well, but I just discovered this model, is this the version that trains higher epochs?
If possible please add a lora version.
I rechecked github and it turns out that this is for training the preference model used.
bdsqlsz
changed discussion status to
closed