Spaces:
Running
Meta-Llama-3.1-405B-Instruct-FP8 seems to be misconfigured
With a sufficiently long conversation using Meta-Llama-3.1-405B-Instruct-FP8, this error occurs 100% of the time, making the model unusable after a certain point:
Input validation error:
inputs
tokens +max_new_tokens
must be <= 16384. Given: 14337inputs
tokens and 2048max_new_tokens
For Meta-Llama-3.1-405B-Instruct-FP8, truncate
is set to 14337 and max_new_tokens
is set to 2048. Added together, these are 16385 tokens, which is 2^14+1. Seems like an off-by-one, it will always fail the check <= 16384
In comparison, for Meta-Llama-3.1-70B-Instruct, truncate
is set to 7167 and max_new_tokens
is set to 1024. Added together, 8191, which is 2^13-1. Seems like another off-by-one, but in the other direction.
I'm not sure where max_total_tokens
is being set, though.
Similar to #430
Thanks for bringing this up, will take a look
Should be fixed! Let me know if you're still having issues