Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 86
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published Oct 31 • 59
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28 • 16