robertgshaw2
commited on
Commit
•
99a5e49
1
Parent(s):
a037bbe
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- int4
|
10 |
---
|
11 |
|
12 |
-
##
|
13 |
This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
14 |
|
15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|
|
|
9 |
- int4
|
10 |
---
|
11 |
|
12 |
+
## zephyr-7b-beta-marlin
|
13 |
This repo contains model files for [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) optimized for [nm-vllm](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
|
14 |
|
15 |
This model was quantized with [GPTQ](https://arxiv.org/abs/2210.17323) and saved in the Marlin format for efficient 4-bit inference. Marlin is a highly optimized inference kernel for 4 bit models.
|