DPO ruined Bagel's versitility

#2
by Henk717 - opened

When using the model to generate stories the model very rapidly derails from story writing or refuses to do so and merely claims it is story writing.
Considering that Gutenberg is in here that is highly undesirable behavior that directly seems to come from the fact DPO was used to bias towards an instruct dataset. The original Bagel without DPO performs these tasks well.

Thanks for the feedback @Henk717 , the DPO process for yi-34b with qlora is still a bit of trial-and-error. I think, perhaps, the learning rate was too high (even though it was orders of magnitude lower than the SFT phase, and I stopped it early at 1/3 of the way through).

Will work on improving the process over time.

Sign up or log in to comment