Error -- ggml_allocr_alloc: not enough space in the buffer (needed 136059008, largest block available 16891904)

#1
by RajeshkumarV - opened

When i try to either embed or vectorize a PDF document or try to retrieve a vector i get this error. I was able to vectorize the data in the earlier ggml format, but that stopped working with few errors and then the community moved to this new format of GGUF, but ever since then facing this issue.

ggml_allocr_alloc: not enough space in the buffer (needed 136059008, largest block available 16891904)
GGML_ASSERT: C:\Users\Rajesh_Kumar_V1\AppData\Local\Temp\pip-install-0ohg_aj6\llama-cpp-python_29c4846b4af1471bbb28a41659b32aa3\vendor\llama.cpp\ggml-alloc.c:144: !"not enough space in the buffer"

I've encountered the same and while I can't give you an exact root cause for why it's exceeding allocated VRAM nor remember exactly what I did to avoid it, you should be able to work around it by reducing any dimension that causes VRAM usage to grow beyond the allocation (ctx size etc.). Not sure if maybe you can handle it just by reducing batch size?

I am also having the same problem. Any solutions?

Sign up or log in to comment