LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 10
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 9
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers Paper • 2309.16119 • Published Sep 28, 2023 • 1
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Paper • 2306.00978 • Published Jun 1, 2023 • 5
LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning Paper • 2305.18403 • Published May 28, 2023 • 1
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Paper • 2211.10438 • Published Nov 18, 2022 • 2
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Paper • 2210.17323 • Published Oct 31, 2022 • 5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 4
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 8