Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published about 1 month ago
Cross-lingual Transfer of Reward Models Collection This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment". • 5 items • Updated 23 days ago
iqwiki-kor/uf-g4o_translated-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed8049 Viewer • Updated 24 days ago • 56.8k • 40
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed6247 Viewer • Updated 25 days ago • 10.2k • 33
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed1903 Viewer • Updated 25 days ago • 10.2k • 37
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.1-op-samp4-seed6247 Viewer • Updated 25 days ago • 10.2k • 34
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.1-op-samp4-seed42 Viewer • Updated 25 days ago • 10.2k • 29
Cross-lingual Transfer of Reward Models Collection This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment". • 5 items • Updated 23 days ago