Inference Endpoint usage results in Input Validation Error...less than 512...
#8
by
michael-newsrx-com
- opened
I'm getting:
HfHubHTTPError: 413 Client Error: Payload Too Large for url: https://d36zfqoe2q8z7da5.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: d-z2Qs)
Input validation error: `inputs` must have less than 512 tokens. Given: 785
I'm using to create the model:
ep = create_inference_endpoint( #
ep_name, #
repository="BAAI/bge-large-en-v1.5", #
framework="pytorch", #
accelerator="gpu", #
instance_size="x1", #
instance_type="nvidia-l4", #
region="us-east-1", #
vendor="aws", #
min_replica=0, #
max_replica=1, #
task="sentence-embeddings", #
type=InferenceEndpointType.PROTECTED, #
namespace="newsrx", #
custom_image={ #
"health_route": "/health", #
"url": "ghcr.io/huggingface/text-embeddings-inference:1.5.0", #
"env": { #
"MAX_BATCH_TOKENS": "16384", #
"MAX_CONCURRENT_REQUESTS": "512", #
"MODEL_ID": "/repository", #
"QUANTIZE": "eetq", #
}, #
})
See also: https://github.com/huggingface/text-embeddings-inference/issues/356
michael-newsrx-com
changed discussion title from
Inference Endpoint usage results in
to Inference Endpoint usage results in Input Validation Error...less than 512...
michael-newsrx-com
changed discussion status to
closed