Haihao Shen's picture

8 9 34

Haihao Shen

Haihao

·

https://github.com/intel/intel-extension-for-transformers

AI & ML interests

LLM quantization, sparsity, and acceleration

Articles

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Organizations

Haihao's activity

authored a paper 9 months ago

TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 9

authored 2 papers 10 months ago

Efficient Post-training Quantization with FP8 Formats

Paper • 2309.14592 • Published Sep 26, 2023 • 10

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 8

authored 3 papers about 1 year ago

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Paper • 2210.17114 • Published Oct 31, 2022

Prune Once for All: Sparse Pre-Trained Language Models

Paper • 2111.05754 • Published Nov 10, 2021 • 1