Llama.cpp error

#3
by dillfrescott - opened
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 3200.00 MB
llama_new_context_with_model: kv self size  = 3200.00 MB
llama_new_context_with_model: compute buffer total size = 364.13 MB
llama_new_context_with_model: VRAM scratch buffer: 358.00 MB
llama_new_context_with_model: total VRAM used: 17125.05 MB (model: 13567.05 MB, context: 3558.00 MB)
<|endoftext|>

CUDA error 9 at ggml-cuda.cu:6863: invalid configuration argument
current device: 0

Every other model I've tried works fine except this one. :/

Yeah, this model should be considered experimental. It required a special fix to llama.cpp to work in teh first place, because of its unusual vocab.

I've had several reports of this CUDA error. It's possible that the fix mentioned above only works on CPU, and there's a bug with GPU processing.

Please could you report it on the llama.cpp Github - it's not something I can do anything about, unless/until there's a fix in llama.cpp requiring me to re-make the GGUFs. But it may well be that on the client needs to change, and the GGUFs are already fine.

Feel free to ping me in the Github Issue you raise so I can keep track of it, or link it here.

Gotcha. Thanks! Will do!

Sign up or log in to comment