Transformers
GGUF
mistral
text-generation-inference

Llama.cpp's Server crashes when input is long

#3
by Mihaiii - opened

I use the Q8 version and I provided it a ~1k tokens input, but the server crases and the response is always empty string.

Something is off. Could anyone please confirm the issue?

Sign up or log in to comment