neuralmagic
/

zephyr-7b-beta-marlin

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

robertgshaw2 commited on Mar 6, 2024

Commit

99a5e49

•

1 Parent(s): a037bbe

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ tags:
 - int4
 ---
-## neuralmagic/zephyr-7b-beta-marlin
 This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
 This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.

 - int4
 ---
+## zephyr-7b-beta-marlin
 This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
 This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.