Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Paper • 2306.12929 • Published Jun 22, 2023 • 12
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models Paper • 2309.02784 • Published Sep 6, 2023 • 1
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models Paper • 2310.08041 • Published Oct 12, 2023 • 1
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 603