Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper ā¢ 2502.05171 ā¢ Published 9 days ago ā¢ 102
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper ā¢ 2501.18512 ā¢ Published 17 days ago ā¢ 26
Structured 3D Latents for Scalable and Versatile 3D Generation Paper ā¢ 2412.01506 ā¢ Published Dec 2, 2024 ā¢ 60
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper ā¢ 2306.13649 ā¢ Published Jun 23, 2023 ā¢ 20
Cautious Optimizers: Improving Training with One Line of Code Paper ā¢ 2411.16085 ā¢ Published Nov 25, 2024 ā¢ 19
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper ā¢ 2409.02634 ā¢ Published Sep 4, 2024 ā¢ 93
Memory-Efficient LLM Training with Online Subspace Descent Paper ā¢ 2408.12857 ā¢ Published Aug 23, 2024 ā¢ 14
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15, 2024 ā¢ 174
Longhorn: State Space Models are Amortized Online Learners Paper ā¢ 2407.14207 ā¢ Published Jul 19, 2024 ā¢ 18
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper ā¢ 2311.06242 ā¢ Published Nov 10, 2023 ā¢ 90
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper ā¢ 2402.04347 ā¢ Published Feb 6, 2024 ā¢ 14
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper ā¢ 2405.11157 ā¢ Published May 18, 2024 ā¢ 29
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts Paper ā¢ 2405.07518 ā¢ Published May 13, 2024 ā¢ 27
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper ā¢ 2404.14219 ā¢ Published Apr 22, 2024 ā¢ 256
Efficiently Adapting Pretrained Language Models To New Languages Paper ā¢ 2311.05741 ā¢ Published Nov 9, 2023 ā¢ 11
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method Paper ā¢ 2402.17193 ā¢ Published Feb 27, 2024 ā¢ 24