Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 7 days ago • 52 • 6
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 26 days ago • 38 • 3
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14 • 71 • 4
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30 • 23 • 4
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138 • 11
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 36 • 4
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7 • 55 • 3
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27 • 52 • 2