HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..

#430
by Kostyak - opened

I use the meta-llama/Meta-Llama-3-70B-Instruct model. After a certain number of moves, the AI refuses to walk and gives an error : "Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6391 inputs tokens and 2047 max_new_tokens". Is this a bug or some new limitation? I still don't get it to be honest and I hope I get an answer here. I'm new to this site.

Kostyak changed discussion status to closed
Kostyak changed discussion title from Input validation error: `inputs` tokens + `max_new_tokens` must be.. to HuggingChat: Input validation error: `inputs` tokens + `max_new_tokens` must be..
Kostyak changed discussion status to open
Kostyak changed discussion status to closed
Kostyak changed discussion status to open

Same issue all of the sudden today

Hugging Chat org

Can you see if this still happens? Should be fixed now.

This comment has been hidden

Can you see if this still happens? Should be fixed now.

Still same error, except numbers have changed a little.
Screenshot_20.png

I keep getting this error as well. Using CohereForAI

Same error, "Meta-Llama-3-70B-Instruct" model.

I have also been running into this error. Is there a workaround or solution at all?

"Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6474 inputs tokens and 2047 max_new_tokens"

Using the meta-llama/Meta-Llama-3-70B-Instruct model.

Keep getting same error on llama3-70b. If the message prompt crosses context length shouldn't it automatically truncate or something like that?

It happens more often than not, even when using like 7 words.

Using the meta-llama/Meta-Llama-3-70B-Instruct model.
Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6477 inputs tokens and 2047 max_new_tokens

Happening to me right now:
Input validation error: `inputs` tokens + `max_new_tokens` must be <= 8192. Given: 6398 `inputs` tokens and 2047 `max_new_tokens

Hugging Chat org

Just to check, are you having long conversations and/or using the websearch? Sorry for the inconvenience, trying to find a fix.

Happens even without web search, just long conversation.

No web search, not really long. The old conversation should be somewhere around 8000 tokens. Like the error sais:
Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 8015 inputs tokens and 2047 max_new_tokens

In new chat length was less before getting the same error. Again, like it is stated in the error it should be somewhere around 6150 tokens.
Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6150 inputs tokens and 2047 max_new_tokens

Just to check, are you having long conversations and/or using the websearch? Sorry for the inconvenience, trying to find a fix.

I was having a long conversation without web search

Just to check, are you having long conversations and/or using the websearch? Sorry for the inconvenience, trying to find a fix.

Likewise, a long conversation without a web search.

Just to check, are you having long conversations and/or using the websearch? Sorry for the inconvenience, trying to find a fix.

There is no inconvenience at all, we appreciate your time and effort trying to fix this. On my end, no web search here, just default. "Assistant will not use internet to do information retrieval and will respond faster. Recommended for most Assistants."

This is def a weird bug, doesn't matter how many words you use in the context it just throw the error and blocks you, you can try to reduce the prompt to 1 word it will throw the error still.

Seems to happen with long conversations. Like I'm hitting a hard limit. I could do a token count if that helps.

Hugging Chat org

I'd really appreciate if you could count the tokens indeed. You can grab the raw prompt by clicking the bottom right icon on the message that gave you the error. It will open a JSON with a field called prompt which contains the raw prompt.

Screenshot 2024-05-04 at 09.17.43.png

Otherwise if someone feels comfortable sharing a conversation, I can have a look directly.

Otherwise if someone feels comfortable sharing a conversation, I can have a look directly.

Here you go: https://hf.co/chat/r/v_U0GXB

I'd really appreciate if you could count the tokens indeed. You can grab the raw prompt by clicking the bottom right icon on the message that gave you the error. It will open a JSON with a field called prompt which contains the raw prompt.

Screenshot 2024-05-04 at 09.17.43.png

Otherwise if someone feels comfortable sharing a conversation, I can have a look directly.

Here you go, please: https://hf.co/chat/r/7MLJ8EX

Otherwise if someone feels comfortable sharing a conversation, I can have a look directly.

Hi there! It happens here too. Here's my conversation https://hf.co/chat/r/1yeBRAV

Thanks in advance!

Anyone know if this is fixed?

Anyone know if this is fixed?

nope, still same error.

Hugging Chat org

I asked internally, trying to get to the bottom of this, sorry for the inconvenience!

I'm also getting this problem. It's very annoying. I know the service is free, but I wouldn't mind paying for it if it got rid of this error.

When will they fix this error? It's literally annoying especially when I was trying to make LLama 3 fix the code

Any news on fixing this bug?

Hi . I also have the same problem when sending a link of a facebook page, but i've already done it in other chats and there were no problem.

The issue for me is that i need to change conversation because i can't use the chat anymore, and that's a problem because I was using to deliver a business service .

I would appreciate you very much for trying, I can share the conversation as well

Running into the same issue: I'm iterating over a defined set of strings trying out best prompting strategy, and it gives me this error with random strings at random times. Can't make sense of it. Using the meta-llama/Meta-Llama-3-8B-Instruct model.

Any updates?

I'm also getting the same problem. can i help in any way?

I am getting the same error as well usually on long conversations which involves code reviews, documentations etc

yeah, still getting this issue , its so annoying.

Bruh it is never going to be fixed I guess ๐Ÿ˜ญ

Same issueee Input validation error: inputstokens +max_new_tokensmust be <= 4096. Given: 4076inputstokens and 100max_new_tokens``

I've got the same issue in a long conversation. If I branch a prompt, the ai answer me but I can't add any request after the branch. I tried with several models and I'v got the same result : Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 6269 inputs tokens and 2047 max_new_tokens . If I go to another conversation it's working.

Any updates?

Seems that the issue lies on how the context is being handled, I think the best approach here would be to clear the context after a few messages to maintain it always with enough tokens to keep conversation going, maybe set to retrieve only the last 3-4 messages which would create less context but would probably avoid that error which seems to be when context is full and you have to add a new chat and start all over again till it happen again.

I think the best approach here would be to clear the context after a few messages

Can you give an example on how to do this?

I think the best approach here would be to clear the context after a few messages

Can you give an example on how to do this?

The error we're encountering is probably due to the limitation on the total number of tokens that can be processed by the LLaMA model. To resolve this issue, developers can implement a mechanism to truncate the conversation context after a certain number of messages.

Something like this could work for let's say the last 5 messages, but this has to be done in the backend:


conversation_history = []

def process_message(user_input):
    global conversation_history
    
    # Add the user's input to the conversation history
    conversation_history.append(user_input)
    
    # Truncate the conversation history to keep only the last 5 messages
    if len(conversation_history) > 5:
        conversation_history = conversation_history[-5:]
    
    # Prepare the input for the LLaMA model
    input_text = "\n".join(conversation_history)
    
    # Call the LLaMA model with the truncated input
    response = llama_model(input_text)
    
    # Append the response to the conversation history
    conversation_history.append(response)
    
    return response

Or this for the frontend


let conversationHistory = [];

function processMessage(userInput) {
  conversationHistory.push(userInput);

  // Truncate the conversation history to keep only the last 5 messages
  if (conversationHistory.length > 5) {
    conversationHistory = conversationHistory.slice(-5);
  }

  // Prepare the input for the LLaMA model
  let inputText = conversationHistory.join("\n");

  // Call the LLaMA model with the truncated input
  $.ajax({
    type: "POST",
    url: "/llama-endpoint", // Replace with your LLaMA model endpoint
    data: { input: inputText },
    success: function(response) {
      // Append the response to the conversation history
      conversationHistory.push(response);

      // Update the conversation display
      $("#conversation-display").append(`<p>${response}</p>`);
    }
  });
}

That could potentially fix this bug.

Thanks for the help but Iโ€™m using the Huggingface chat website. Iโ€™ve not clue how to input this code.

Thanks for the help but Iโ€™m using the Huggingface chat website. Iโ€™ve not clue how to input this code.

I know, I mean developers have to check if that can fix the issue on their end FYI @nsarrazin

Sign up or log in to comment