token limit exceeded
I am getting a warning of the "Number of tokens exceeded maximum context length (512)" as shown in the screenshot below.
how to solve this issue.
code :
def load_llm():
# Load the locally downloaded model here
llm = CTransformers(
model = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
model_type="llama",
max_new_tokens = 512,
temperature = 0.5
)
return llm
Increase your max_new_tokens value.
i am still facing this issue
Is your maximum context length still 512 tokens? You should be setting your 'max_new_tokens = ' to at least 2048 if you're planning on using that many tokens in an interaction. Otherwise you need to start trimming the context that you're sending into the LLM. There are a number of options for context trimming, like removing the first x# of tokens, or obtaining a condensed summary of the existing context using another LLM that does summarization and using that to feed into the conversation instead of the actual interaction history.
At the very least, unless you aren't planning on feeding anything other than a 1-shot prompt into the LLM, a 512 token context is too small (it includes your prompt and the response from the LLM)