MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22 • 12
LOGO -- Long cOntext aliGnment via efficient preference Optimization Paper • 2410.18533 • Published Oct 24 • 42
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 88
LongReward: Improving Long-context Large Language Models with AI Feedback Paper • 2410.21252 • Published Oct 28 • 16
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25 • 18