"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46 • 3
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24 • 1
Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning Paper • 2407.10994 • Published Jun 24
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search Paper • 2410.14649 • Published Oct 18 • 8
daslab-testing/Llama-3.1-70B-Instruct-gptq4-128-True-seed1_mse1_staticFalse_clipFalse_fineweb Updated Oct 18 • 3
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17 • 60
daslab-testing/Meta-Llama-3.1-8B-Instruct-gptq4-128-True-seed1_mse1_staticFalse_clipTrue_fineweb Updated Oct 3 • 46
daslab-testing/Meta-Llama-3.1-8B-Instruct-gptq4-128-True-seed1_mse1_staticTrue_clipTrue_orca_fineweb Updated Oct 3 • 39