Anyone able to get this working on koboldcpp?

by lemon07r - opened May 7, 2024

May 7, 2024

Crashes when I try to load the model, same issue with quantfactory's quant of this model too. Maybe koboldcpp doesnt have the required upstream merges from llamacpp yet? Wondering if someone can confirm. I tested lost ruins koboldcpp with openblas and vulkan both, and yellowroses hipblas fork, neither can load this model. Tested with Q4k_M

Elfrino

May 7, 2024

Yes, I'm getting the same crash. We'll have to wait until KoboldCPP updates this change:

https://github.com/LostRuins/koboldcpp/commit/889bdd76866ea31a7625ec2dcea63ff469f3e981

lejunzhu

May 7, 2024

If you build it from source code, you can use the "concedo_experimental" branch. As of now, it has PR #7063 from upstream which is the new tokenizer.

bartowski

Owner May 7, 2024

Thanks for looking into this, yeah another one of those "update your backend" changes

Elfrino

May 11, 2024

Crashes when I try to load the model, same issue with quantfactory's quant of this model too. Maybe koboldcpp doesnt have the required upstream merges from llamacpp yet? Wondering if someone can confirm. I tested lost ruins koboldcpp with openblas and vulkan both, and yellowroses hipblas fork, neither can load this model. Tested with Q4k_M

Hey just tested with the latest Kobold release, working great!:

https://github.com/LostRuins/koboldcpp/releases

lemon07r

May 12, 2024

•

edited May 12, 2024

Thanks for looking into this, yeah another one of those "update your backend" changes

Working great in the latest kcpp release. The iquant versions will work for cpu only inference but wont work for me when I do any sort of gpu offloading, clblas. vulkan or otherwise on my 6900 xt. I tried iq4 and iq3 quants, they work with clblast but not vulkan when I try to offload any amount.

EDIT - Here's the last message I see on screen before it crashes:

GGML_ASSERT: ggml-vulkan.cpp:2940: !qx_needs_dequant || to_fp16_vk_0 != nullptr

bartowski

Owner May 12, 2024

That's expected, you can see the support table here:

https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment