Number of tokens (525) exceeded maximum context length (512).

#7
by ashubi - opened

I'm chatting with documents using TheBloke/Llama-2-7B-GGML, but when I ask a question, it says, "Number of tokens (525) exceeded maximum context length (512)." and this number keeps going up—526, 527, etc.—and eventually it will respond in an unstructured manner. I am running this model in CPU
Note: The response is good if the warning "Number of tokens (525) exceeded maximum context length (512)" is not generated for any query.
Screenshot from 2023-12-30 12-59-49.png

Sign up or log in to comment