Fizzarolli/duloxetine-4b-v1

18 days ago

Yi 1.5 is impressive, fast and it's KV cache is pretty small.
There's a lack of Yi-1.5 tunes & it seems like it would be fun and it feels less robotic than llama-3 instruct 🐥

Fizzarolli

Owner 18 days ago

maybe! although, at that parameter size, it does feel like we're getting a bit close to 7/8b to be worth bothering with a slightly worse pretrain :p

saishf

15 days ago

•

edited 15 days ago

Qwen 1.5 32b for some reason has huge KV cache and that curse extended to this model too

Tested with 8K ctx & Flash Attention on KoboldCPP

Model	Size on disk	Vram @ 8Kctx + FA	Difference between model size and vram allocation
duloxetine-4b-v1 @ Q5_K_M	2.8GB	5.9GB	2.1GB
Yi-1.5-6B @ Q5_K_M	4.3GB	4.6GB	0.3GB
WizardLM-2 @ Q5_K_M	5.1GB	5.9GB	0.8GB
Qwen2-7B @ Q4_K_M	4.7GB	4.9GB	0.2GB
Yi-1.5-9B @ Q4_K_M	5.3GB	5.8GB	0.5GB
Llama-3-8B @ Q4_K_M	4.9GB	5.7GB	0.8GB