Update README.md
Browse files
README.md
CHANGED
@@ -66,7 +66,7 @@ datasets:
|
|
66 |
|
67 |
This repo contains 8 Bit quantized GPTQ model files for [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
68 |
|
69 |
-
This model can be loaded with just over 10GB of VRAM and can be served lightning fast with the cheapest Nvidia GPUs possible (Nvidia T4, Nvidia K80, RTX 4070, etc).
|
70 |
|
71 |
The 8 bit GPTQ quant has minimum quality degradation from the original `bfloat16` model due to its higher bitrate.
|
72 |
|
|
|
66 |
|
67 |
This repo contains 8 Bit quantized GPTQ model files for [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
|
68 |
|
69 |
+
This model can be loaded with just over 10GB of VRAM (compared to the original 16.07GB model) and can be served lightning fast with the cheapest Nvidia GPUs possible (Nvidia T4, Nvidia K80, RTX 4070, etc).
|
70 |
|
71 |
The 8 bit GPTQ quant has minimum quality degradation from the original `bfloat16` model due to its higher bitrate.
|
72 |
|