Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 77
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 78
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 65