matlok 's Collections
LMM

Papers - Fine-tuning - RLHF - Direct Nash Optimization (DNO)

Reward expressed as win-rates related to general preferences