Llama-cpp-python meet ValueError: Failed to create llama_context

#32
by zhouzr - opened

CUDA 12.3
llama-cpp-python 0.2.62

....................................................................................................
llama_new_context_with_model: n_ctx = 32000
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 1875.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 1750.00 MiB
llama_kv_cache_init: CUDA2 KV buffer size = 1750.00 MiB
llama_kv_cache_init: CUDA3 KV buffer size = 1625.00 MiB
llama_new_context_with_model: KV self size = 7000.00 MiB, K (f16): 3500.00 MiB, V (f16): 3500.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 3366.04 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 3529547776
llama_new_context_with_model: failed to allocate compute buffers
Traceback (most recent call last):
File "/home/t2_game2_fire.py", line 147, in
model = Llama(f"/home/Mixtral-8x22B-Instruct-v0.1.Q4_K_M-00001-of-00002.gguf", n_gpu_layers=-1, max_tokens=32000, n_ctx=32000)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/llama_cpp/llama.py", line 336, in init
self._ctx = _LlamaContext(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/llama_cpp/_internals.py", line 265, in init
raise ValueError("Failed to create llama_context")
ValueError: Failed to create llama_context

Sign up or log in to comment