M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20, 2024 • 11
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20, 2024 • 11
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences Paper • 2403.07230 • Published Mar 12, 2024
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Paper • 2406.16783 • Published Jun 24, 2024 • 4