aifeifei798
/

llama3-8B-DarkIdol-2.1-Uncensored-32K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

aifeifei798 commited on Jun 28

Commit

24ea3d8

•

1 Parent(s): a89991c

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ tags:
 # Special Thanks:
  - Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
  - https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF-IQ-Imatrix-Request
  - The difference with normal quantizations is that I quantize the output and embed tensors to f16.and the other tensors to 15_k,q6_k or q8_0.This creates models that are little or not degraded at all and have a smaller size.They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs
  - ZeroWw/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF
  - More models here: https://huggingface.co/RobertSinclair

 # Special Thanks:
  - Lewdiculous's superb gguf version, thank you for your conscientious and responsible dedication.
  - https://huggingface.co/LWDCLS/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF-IQ-Imatrix-Request
+# fast quantizations
  - The difference with normal quantizations is that I quantize the output and embed tensors to f16.and the other tensors to 15_k,q6_k or q8_0.This creates models that are little or not degraded at all and have a smaller size.They run at about 3-6 t/sec on CPU only using llama.cpp And obviously faster on computers with potent GPUs
  - ZeroWw/llama3-8B-DarkIdol-2.1-Uncensored-32K-GGUF
  - More models here: https://huggingface.co/RobertSinclair