RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published 4 days ago • 24
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published 22 days ago • 37