neuralmagic
/

Qwen2-1.5B-Instruct-FP8

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lin-K76 commited on Jun 14

Commit

057ac8b

•

1 Parent(s): 60d280c

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,10 +12,10 @@ tags:
 * <h3 style="display: inline;">Release Date:</h3> June 14, 2024
 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
-Qwen2-1.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the AutoFP8 repository, ready for inference with vLLM >= 0.5.0.
 Calibrated with 512 UltraChat samples to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~40%.
-Part of the FP8 LLMs for vLLM collection.
 ## Usage and Creation

 * <h3 style="display: inline;">Release Date:</h3> June 14, 2024
 * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
+Qwen2-1.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the [AutoFP8 repository](https://github.com/neuralmagic/AutoFP8), ready for inference with vLLM >= 0.5.0.
 Calibrated with 512 UltraChat samples to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
 Reduces space on disk by ~40%.
+Part of the [FP8 LLMs for vLLM collection](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127).
 ## Usage and Creation