--- inference: false license: other --- # Monero's WizardLM-Uncensored-SuperCOT-Storytelling-30B GGML These files are GGML format model files for [Monero's WizardLM-Uncensored-SuperCOT-Storytelling-30B](https://huggingface.co/Monero/WizardLM-Uncensored-SuperCOT-StoryTelling-30b). # Works only with PR 1684: https://github.com/ggerganov/llama.cpp/pull/1684 ## Prompt template ``` Optional instruction ("You are a helpful assistant" etc) USER: prompt ASSISTANT: ``` *The quality of the 3-bit model is higher than the 2-bit model, but the interface is slower. The 3-bit model (type q3_K_S) barely fits into 16 GB of RAM, but it works.* ``` llama_model_load_internal: mem required = 15716.00 MB (+ 3124.00 MB per state) ``` *On my Xeon E3-1225 v3 4/8 old cpu, it runs with ~715 ms per token.*