May I ask how this differs from the 10ep version?

#2
by bdsqlsz - opened

Thank you for your great work.
SPO works very well, but I just discovered this model, is this the version that trains higher epochs?
If possible please add a lora version.

I rechecked github and it turns out that this is for training the preference model used.

bdsqlsz changed discussion status to closed

Sign up or log in to comment