Text Generation
Transformers
Safetensors
English
qwen2
axolotl
conversational
Inference Endpoints
text-generation-inference

Yi-1.5-6B?

#2
by saishf - opened

Yi 1.5 is impressive, fast and it's KV cache is pretty small.
There's a lack of Yi-1.5 tunes & it seems like it would be fun and it feels less robotic than llama-3 instruct 🐥

maybe! although, at that parameter size, it does feel like we're getting a bit close to 7/8b to be worth bothering with a slightly worse pretrain :p

Qwen 1.5 32b for some reason has huge KV cache and that curse extended to this model too

Tested with 8K ctx & Flash Attention on KoboldCPP

Model Size on disk Vram @ 8Kctx + FA Difference between model size and vram allocation
duloxetine-4b-v1 @ Q5_K_M 2.8GB 5.9GB 2.1GB
Yi-1.5-6B @ Q5_K_M 4.3GB 4.6GB 0.3GB
WizardLM-2 @ Q5_K_M 5.1GB 5.9GB 0.8GB
Qwen2-7B @ Q4_K_M 4.7GB 4.9GB 0.2GB
Yi-1.5-9B @ Q4_K_M 5.3GB 5.8GB 0.5GB
Llama-3-8B @ Q4_K_M 4.9GB 5.7GB 0.8GB

duloxetine-4b-v1 @ Q5_K_M
Screenshot 2024-06-11 233710.png
Yi-1.5-6B @ Q5_K_M
image.png
WizardLM-2 @ Q5_K_M
image.png
Qwen2-7B @ Q4_K_M
image.png
Yi-1.5-9B @ Q4_K_M
image.png
Llama-3-8B @ Q4_K_M
image.png

Edit - Added Llama-3

image.png
i wonder why qwen's kv cache is so big

regardless maybe yi would be something fun to look into! alternatively i do endorse bribery

@saishf
image.png
decided to train the 9b instead so it makes a bit more sense over 7b/8b trains, also it has an extended native ctxlen. but its goin

Sign up or log in to comment