RachidAR's picture
Update README.md
44f8f19
metadata
inference: false
license: other

Monero's WizardLM-Uncensored-SuperCOT-Storytelling-30B GGML

These files are GGML format model files for Monero's WizardLM-Uncensored-SuperCOT-Storytelling-30B.

Works with latest llama.cpp version. (05/06/23 build = 622)

Prompt template

Optional instruction ("You are a helpful assistant" etc)
USER: prompt
ASSISTANT:

The quality of the 3-bit model is higher than the 2-bit model, but the interface is slower. The 3-bit model (type q3_K_S) barely fits into 16 GB of RAM, but it works.

llama_model_load_internal: mem required  = 15716.00 MB (+ 3124.00 MB per state) 

On my Xeon E3-1225 v3 4/8 old cpu, it runs with ~740 ms per token.