Spaces:

neuralmagic
/

README

Running

mgoin commited on Nov 5, 2024

Commit

7101434

verified ·

1 Parent(s): 7c8dafe

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,6 +15,6 @@ Download our compression-aware inference engines and open source tools for fast
 * [LLM Compressor](https://github.com/vllm-project/llm-compressor/): HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
 * [DeepSparse](https://github.com/neuralmagic/deepsparse): Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
-![NM Workflow](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/oFtTSqKjDLwd095gtYHlc.png)
 In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

 * [LLM Compressor](https://github.com/vllm-project/llm-compressor/): HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
 * [DeepSparse](https://github.com/neuralmagic/deepsparse): Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
+![NM Workflow](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/QacT1zAnoidTKqRTY4NxH.png)
 In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.