aifeifei798
commited on
Commit
•
24ea3d8
1
Parent(s):
a89991c
Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,8 @@ tags:
|
|
13 |
# Special Thanks:
|
14 |
- Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
|
15 |
- https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF-IQ-Imatrix-Request
|
|
|
|
|
16 |
- The difference with normal quantizations is that I quantize the output and embed tensors to f16.and the other tensors to 15_k,q6_k or q8_0.This creates models that are little or not degraded at all and have a smaller size.They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs
|
17 |
- ZeroWw/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF
|
18 |
- More models here: https://huggingface.co/RobertSinclair
|
|
|
13 |
# Special Thanks:
|
14 |
- Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
|
15 |
- https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF-IQ-Imatrix-Request
|
16 |
+
|
17 |
+
# fast quantizations
|
18 |
- The difference with normal quantizations is that I quantize the output and embed tensors to f16.and the other tensors to 15_k,q6_k or q8_0.This creates models that are little or not degraded at all and have a smaller size.They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs
|
19 |
- ZeroWw/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF
|
20 |
- More models here: https://huggingface.co/RobertSinclair
|