Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 2 days ago • 21 • 3
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Paper • 2408.01050 • Published Aug 2 • 8 • 4
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 40 • 4
Investigating Answerability of LLMs for Long-Form Question Answering Paper • 2309.08210 • Published Sep 15, 2023 • 12 • 1