Question about cDPO

#2
by athirdpath - opened

Hello, and thank you for all your work on MergeKit.

I'm using my Iambe model to produce a uncensored role-playing DPO pairs dataset at the moment, I'm up to ~3k examples. When you say cDPO, I assume you're referring to this mini-paper? If so, is there an open source repo out there that supports it? I understand the broad strokes and like what I see but couldn't implement it myself.

Hi! Glad you're finding it useful - your experiments with 20b models are quite interesting.

Yep, that's the mini-paper in question. Trl added support for the cDPO loss function in commit c84e591. You can enable it by passing the label_smoothing argument to DPOTrainer.

Thank you!

athirdpath changed discussion status to closed

Sign up or log in to comment