iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.01-Iter1-v2-Self-seed42 Viewer • Updated 2 days ago • 60.9k • 4
Stable Language Model Pre-training by Reducing Embedding Variability Paper • 2409.07787 • Published Sep 12, 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23, 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23, 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published Jun 10, 2024 • 12
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 2
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published Jun 10, 2024 • 12
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 64
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 64
Can Large Language Models Infer and Disagree Like Humans? Paper • 2305.13788 • Published May 23, 2023