maddes8cht/ehartford-WizardLM-Uncensored-Falcon-40b-gguf

I'm running ggml-ehartford-WizardLM-Uncensored-Falcon-40b-Q3_K_S.gguf (as it's the largest I can fit into a 24GB card, so I don't have to also occupy my secondary card), and it's running great!

I had previously run a task on TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ. I had to modify it a good bit as this model tends to want to "elaborate on its input using its knowledge of the topic" when I wanted it to only act on what the input text said. And a couple other issues, like it tending to only focus on the end of the input text. But I was largely able to fix them via prompt adjustments. Now it's making nice, neatly-formatted, consistent responses, and overall the output is better than TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ. Very nice!

BTW, if you ever feel fit to do any "lightweight" models (e.g. just a couple billion parameters), that would be appreciated. :) They're nice to use as a base for training new, highly-specialized models. You could even just take a heavyweight model and strip out deep layers then retrain for a bit - I've seen this done with a (non-open) model and it worked well.

maddes8cht
/

ehartford-WizardLM-Uncensored-Falcon-40b-gguf

Thanks for this!