Morgan Funtowicz
mfuntowicz
AI & ML interests
Model inference low-level optimization, hardware affinity and large-scale distributed training.
Articles
Organizations
Collections
1
Tracks papers and links about neural network compression and quantization technics
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 4 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 5 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 2 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 5
Papers
1
models
5
datasets
None public yet