Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Paper • 2312.15234 • Published Dec 23, 2023 • 3
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10 • 8
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 47