Input token size issue, does it realy supports 32k tokens?

#197
by sunnykusawa - opened

I am running Mixtral 8x7b throught sagemaker endpoint. IT should suppoort 32k input token size but am getting validation error mesage that input token + max_new_toke must be <=2048.

"error": "Input validation error: `inputs` tokens + `max_new_tokens` must be <= 2048. Given: 13179 `inputs` tokens and 1024 `max_new_tokens`",
"error_type": "validation"

I am able to solve this issue for sagemaker endpoint.

we need to set environment variables MAX_INPUT_LENGTH and MAX_TOTAL_TOKEN.

While deploying llm with sagemaker add this environment variables

hub = {
'HF_MODEL_ID':'mistralai/Mixtral-8x7B-Instruct-v0.1',
'SM_NUM_GPUS': json.dumps(8),
"MAX_INPUT_LENGTH": '30000', => put here any value upto 32768 as per your requirement.
"MAX_TOTAL_TOKENS": '32768',
"MAX_BATCH_PREFILL_TOKENS": '32768',
"MAX_BATCH_TOTAL_TOKENS": '32768',
}

It will change the defaul MAX_INPUT_TOKEN size from 2048 to 30000

Sign up or log in to comment