DPO ruined Bagel's versitility

by Henk717 - opened Jan 2

Jan 2

When using the model to generate stories the model very rapidly derails from story writing or refuses to do so and merely claims it is story writing.
Considering that Gutenberg is in here that is highly undesirable behavior that directly seems to come from the fact DPO was used to bias towards an instruct dataset. The original Bagel without DPO performs these tasks well.

jondurbin

Owner Jan 6

Thanks for the feedback @Henk717 , the DPO process for yi-34b with qlora is still a bit of trial-and-error. I think, perhaps, the learning rate was too high (even though it was orders of magnitude lower than the SFT phase, and I stopped it early at 1/3 of the way through).

Will work on improving the process over time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment