ehzoah 's Collections

Efficient Exact Optimization

SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"