matlok 's Collections
LMM

Papers - Reward Model - Cross-Lingual

We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually