fimbulvntr/lewd-stories · Hugging Face

Notes

There is no template, just BOS+text

It can also start from nothing

Temperature, repetition penalty, etc should all be left as defaults

It will not go lewd immediately, it will try to form a coherent story

It's best to generate 1~3 paragraphs at a time, it loses coherence if you try to make it generate the full context all at once

RoPEd to 16k context

Name	Quant	Size	VRAM (With FA)	VRAM (No FA)
llama-3-8b-lewd-stories-v6-16k.F16	F16	14.9	16.6	17.4
llama-3-8b-lewd-stories-v6-16k.Q8_0	Q8_0	8.0	10.1	10.5
llama-3-8b-lewd-stories-v6-16k.Q6_K	Q6_K	6.1	8.4	9.2
llama-3-8b-lewd-stories-v6-16k.Q5_K_M	Q5_K_M	5.3	7.6	8.1
llama-3-8b-lewd-stories-v6-16k.Q4_K_M	Q4_K_M	4.6	6.9	7.8

Native 32k context

Name	Quant	Size
yi-lewd-stories-32k.F16	F16	16.4
yi-lewd-stories-32k.Q8_0	Q8_0	8.7
yi-lewd-stories-32k.Q6_K	Q6_K	6.7
yi-lewd-stories-32k.Q5_K_M	Q5_K_M	5.8
yi-lewd-stories-32k.Q4_K_M	Q4_K_M	5.0

Native 32k context

Name	Quant	Size
mistral-lewd-stories-32k.F16	F16	13.5
mistral-lewd-stories-32k.Q8_0	Q8_0	7.2
mistral-lewd-stories-32k.Q6_K	Q6_K	5.5
mistral-lewd-stories-32k.Q5_K_M	Q5_K_M	4.8
mistral-lewd-stories-32k.Q4_K_M	Q4_K_M	4.0