Number of tokens exceeded maximum context length

#22
by praxis-dev - opened

Hi, getting this error from time to time with Llama-2-13B-Chat-GGML.

Can I change maximum context length?

every once in a while I see recommendation like this: Try passing n_ctx=4096 to LLama() and it seems to work, but where exactly should I pass it ?

image.png

def load_llm():
    llm = LlamaCpp(
        model_path = model_path,
        n_gpu_layers=5,
        n_batch=128,
        verbose=True,
        f16_kv=True,
        n_ctx=2048
    )
    return llm

I think using LlamaCpp(like so) instead of CTransformers might help.

Sign up or log in to comment