Neural Magic

company

Verified

https://neuralmagic.com/

neuralmagic

neuralmagic

AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

Recent Activity

dsikka updated a model about 2 months ago

neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

dsikka published a model about 2 months ago

neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

jfinks25 updated a Space 4 months ago

neuralmagic/README

View all activity

Organization Card

Community About org cards

The Future of AI is Open

If you are looking for compressed models to run with vLLM, they have been moved to the RedHatAI organization. We are looking forward to continue publishing optimized models for open source use!

Neural Magic (Acquired by Red Hat) helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.

vLLM: A high-throughput and memory-efficient inference engine for at-scale deployment of performant open-source LLMs
LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

Collections 14

View 14 collections

spaces 3

Quant Llms Text Generation

Quantized vs. Unquantized LLM: Text Generation Comparison

Sparse Llama Gsm8k

Solve math problems with chat-based guidance

models 1

neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

Text Generation • 4B • Updated Jul 9 • 641

datasets 13

neuralmagic/calibration

Viewer • Updated Mar 26 • 20k • 332 • 2

neuralmagic/mmlu_it

Viewer • Updated Oct 23, 2024 • 14k • 244

neuralmagic/mmlu_fr

Viewer • Updated Oct 23, 2024 • 14k • 267

neuralmagic/mmlu_th

Viewer • Updated Oct 23, 2024 • 14k • 83

neuralmagic/mmlu_de

Viewer • Updated Oct 23, 2024 • 14k • 210

neuralmagic/mmlu_es

Viewer • Updated Oct 23, 2024 • 14k • 219

neuralmagic/mmlu_hi

Viewer • Updated Oct 23, 2024 • 14k • 185

neuralmagic/mmlu_pt

Viewer • Updated Oct 23, 2024 • 14k • 256

neuralmagic/quantized-llama-3.1-leaderboard-v2-evals

Viewer • Updated Oct 10, 2024 • 247k • 448

neuralmagic/quantized-llama-3.1-humaneval-evals

Viewer • Updated Oct 10, 2024 • 73.8k • 84

View 13 datasets