MaziyarPanahi/Mistral-11B-Instruct-v0.2-Mistral-7B-Instruct-v0.2-slerp Text Generation • Updated Jan 10 • 56 • 2
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 57