mistralai/Mistral-7B-v0.1 · token limit exceeded

Oct 18, 2023

I am getting a warning of the "Number of tokens exceeded maximum context length (512)" as shown in the screenshot below.
how to solve this issue.
code :
def load_llm():
# Load the locally downloaded model here
llm = CTransformers(
model = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
model_type="llama",
max_new_tokens = 512,
temperature = 0.5
)
return llm

TheGeeBee

Oct 27, 2023

Increase your max_new_tokens value.

Naveen2000

Jan 11, 2024

i am still facing this issue

TheGeeBee

Jan 11, 2024

Is your maximum context length still 512 tokens? You should be setting your 'max_new_tokens = ' to at least 2048 if you're planning on using that many tokens in an interaction. Otherwise you need to start trimming the context that you're sending into the LLM. There are a number of options for context trimming, like removing the first x# of tokens, or obtaining a condensed summary of the existing context using another LLM that does summarization and using that to feed into the conversation instead of the actual interaction history.

At the very least, unless you aren't planning on feeding anything other than a 1-shot prompt into the LLM, a 512 token context is too small (it includes your prompt and the response from the LLM)