KoboldCpp version 1.64?

#3
by SolidSnacke - opened

Are you from the future?

Not even talking about the fork, just saying that version 1.64 will have the fix merged, haha. So it's safe to recommend 1.64 already, for when it's out.

You can try this fork if* they already have all the fixes merged, but I personally stick to the official releases.

Lewdiculous changed discussion status to closed

Not even talking about the fork, just saying that version 1.64 will have the fix merged, haha. So it's safe to recommend 1.64 already, for when it's out.

You can try this fork of they already have all the fixes merged, but I personally stick to the official releases.

I don't believe it will, it's from 2 days prior to the bpe fix. I personally use the fork and it's not any faster for me, just a little less vram. There's no point in switching unless you desperately need 0.2gb of vram extra

We shall wait then.

We shall wait then.

Nexesenex is so quick 😭

https://github.com/Nexesenex/kobold.cpp/releases

latest exe supports bpe tokenization and flash attention.
Don't know which version of fa though. my old 20 series is useless for fa2
Support for turing gpus has been coming soon since july 2023 πŸ₯²

Update on flash attention. It works on turing and it saves A fxck ton of vram even for 8GB!
With FA
Screenshot 2024-05-01 235353.png
Without FA
Screenshot 2024-05-01 235536.png
Both Q4_K_M

Q4_K_S makes 24k Llama3 context possible :3
Or makes room for a sd checkpoint ggufified
Or a vision model
But SOVL explodes at 24k
image.png

image.png

We shall wait then.

Wait for offical repo is over!
As of 12 minutes ago :3
https://github.com/LostRuins/koboldcpp/releases/tag/v1.64

nice!

Sign up or log in to comment