Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated 27 days ago • 52
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated 27 days ago • 88
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates Paper • 2307.05695 • Published Jul 11, 2023 • 21