Program terminated while giving multiple request at a time
GGML_ASSERT: /tmp/pip-install-3q_fwex4/llama-cpp-python_520e3a5b95cc4b339cb4759635dc8a44/vendor/llama.cpp/ggml-cuda.cu:6741: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
The above error is created while I try to process multiple request at a time.The error is happening from a chat bot created using Llama-2-7b chat GGUF file locally
@krishnapiya well i dont know a solution but you should post this issue on the gguf model not this one. this is ggml and completely outdated