Error when running pipe: temp_state buffer is too small

#35
by StefanStroescu - opened

Hello,

I am trying to use the model to generate me an answer from a context I provide, but when I get to text generation, I get this error: temp_state buffer is too small.
I think it is because my prompt is quite large in terms of tokens, because when I prompt the model without context it works.

I checked and is not an issue of resources, GPU or RAM, and the Llama-2-13B-chat-GPTQ worked when prompted with context.

Does anyone have any suggestions on how to solve this?

Thanks,

Thanks, Komposter43,

I don't know if it has anything to do with this (https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/29), but I noticed that the model accepts only inference requests under 2048 tokens.

Sign up or log in to comment