The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 12 days ago • 53
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile Paper • 2502.06155 • Published 15 days ago • 7
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 24
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 36
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Paper • 2406.10774 • Published Jun 16, 2024 • 2
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 570
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 18
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 18
Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7, 2024 • 20
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving Paper • 2310.19102 • Published Oct 29, 2023 • 11
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 36
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning Paper • 2311.02103 • Published Nov 1, 2023 • 20