Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ariG23498 
posted an update about 13 hours ago
Post
513
Tried my hand at simplifying the derivations of Direct Preference Optimization.

I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.

Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo
This comment has been hidden