FP8 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

Granite 3.1 Quantization

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

FP8 LLMs for vLLM

updated Oct 17, 2024

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!

neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8

Text Generation • Updated Oct 9, 2024 • 2.18k • 31
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • Updated Oct 9, 2024 • 125k • 37
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • Updated Oct 9, 2024 • 65.2k • 39
neuralmagic/Mistral-Nemo-Instruct-2407-FP8

Text Generation • Updated Jul 19, 2024 • 33.1k • 17
neuralmagic/Meta-Llama-3-8B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 12.2k • 21
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic

Text Generation • Updated Oct 19, 2024 • 152 • 14
neuralmagic/Meta-Llama-3-70B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 28.9k • 11
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8

Text Generation • Updated Jul 18, 2024 • 5.89k • 3
neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV

Text Generation • Updated Jun 19, 2024 • 5.41k • 7
neuralmagic/Meta-Llama-3-70B-Instruct-FP8-KV

Text Generation • Updated Jun 26, 2024 • 170 • 2
neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8

Text Generation • Updated Aug 12, 2024 • 698 • 2
neuralmagic/Qwen2-72B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 566 • 12
neuralmagic/Qwen2-7B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 2.57k • 1
neuralmagic/Qwen2-1.5B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 32
neuralmagic/Qwen2-0.5B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 1.5k • 2
neuralmagic/Mistral-7B-Instruct-v0.3-FP8

Text Generation • Updated Jul 18, 2024 • 351 • 2
neuralmagic/Llama-2-7b-chat-hf-FP8

Text Generation • Updated Jul 18, 2024 • 131
neuralmagic/gemma-2-9b-it-FP8

Text Generation • Updated Jul 18, 2024 • 578 • 5
neuralmagic/Qwen2-57B-A14B-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 371 • 1
neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8

Text Generation • Updated Jul 18, 2024 • 8.06k • 6
neuralmagic/DeepSeek-Coder-V2-Lite-Base-FP8

Text Generation • Updated Jul 18, 2024 • 39
neuralmagic/DeepSeek-Coder-V2-Base-FP8

Text Generation • Updated Jul 22, 2024 • 72
neuralmagic/DeepSeek-Coder-V2-Instruct-FP8

Text Generation • Updated Jul 22, 2024 • 288 • 7
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic

Text Generation • Updated Oct 19, 2024 • 3.17k • 5
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic

Text Generation • Updated Oct 19, 2024 • 1.93k • 6
neuralmagic/Meta-Llama-3.1-8B-FP8

Text Generation • Updated Oct 9, 2024 • 3.14k • 5
neuralmagic/Meta-Llama-3.1-70B-FP8

Text Generation • Updated Oct 9, 2024 • 367 • 1
neuralmagic/starcoder2-15b-FP8

Text Generation • Updated Oct 9, 2024 • 700
neuralmagic/starcoder2-3b-FP8

Text Generation • Updated Oct 9, 2024 • 74
neuralmagic/starcoder2-7b-FP8

Text Generation • Updated Oct 9, 2024 • 83
neuralmagic/Meta-Llama-3.1-405B-FP8

Text Generation • Updated Oct 9, 2024 • 62
neuralmagic/gemma-2-2b-it-FP8

Updated Oct 9, 2024 • 390 • 1
neuralmagic/Llama-3.2-1B-Instruct-FP8-dynamic

Text Generation • Updated Oct 9, 2024 • 5.67k • 2
neuralmagic/Llama-3.2-3B-Instruct-FP8-dynamic

Text Generation • Updated Oct 9, 2024 • 1.77k • 2
neuralmagic/Llama-3.2-3B-Instruct-FP8

Text Generation • Updated Oct 16, 2024 • 9.96k • 4
neuralmagic/Llama-3.2-1B-Instruct-FP8

Text Generation • Updated Oct 16, 2024 • 10k • 2
neuralmagic/Llama-3.2-1B-FP8

Updated Oct 9, 2024 • 79
neuralmagic/Phi-3.5-mini-instruct-FP8-KV

Text Generation • Updated Oct 1, 2024 • 115 • 2
neuralmagic/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic

Text Generation • Updated Oct 17, 2024 • 67.4k • 14