Still problems with context lenght
From description I would expect this model supports a context of 32k, but when using it I sill get "LLaMA ERROR: The prompt is 5034 tokens and the context window is 2048!".
Maybe I need to pass specific parameters ?
This is how I use the model:
llm = GPT4All(model="../llm/sciphi-mistral-7b-32k.Q4_K_M.gguf",
backend='gptj',
callbacks=None,
verbose=True,
#n_predict=768,
temp=0.1)
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
Thanks
Solved with this:
llm = LlamaCpp(
model_path="../llm/sciphi-mistral-7b-32k.Q4_K_M.gguf",
temperature=0.1,
max_tokens=6000,
top_p=1,
callback_manager=callback_manager,
verbose=True,
n_ctx=6144,
)