view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques By jmamou and 8 others • Mar 24 • 18
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Paper • 2502.09390 • Published Feb 13 • 16
view article Article Faster Assisted Generation with Dynamic Speculation By jmamou and 6 others • Oct 8, 2024 • 46
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5, 2024 • 38
Accelerating Speculative Decoding using Dynamic Speculation Length Paper • 2405.04304 • Published May 7, 2024 • 2
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 19