Inference Endpoint usage results in Input Validation Error...less than 512...

#8
by michael-newsrx-com - opened

I'm getting:

HfHubHTTPError: 413 Client Error: Payload Too Large for url: https://d36zfqoe2q8z7da5.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: d-z2Qs)

Input validation error: `inputs` must have less than 512 tokens. Given: 785

I'm using to create the model:

    ep = create_inference_endpoint(  #
            ep_name,  #
            repository="BAAI/bge-large-en-v1.5",  #
            framework="pytorch",  #
            accelerator="gpu",  #
            instance_size="x1",  #
            instance_type="nvidia-l4",  #
            region="us-east-1",  #
            vendor="aws",  #
            min_replica=0,  #
            max_replica=1,  #
            task="sentence-embeddings",  #
            type=InferenceEndpointType.PROTECTED,  #
            namespace="newsrx",  #
            custom_image={  #
                "health_route": "/health",  #
                "url": "ghcr.io/huggingface/text-embeddings-inference:1.5.0",  #
                "env": {  #
                    "MAX_BATCH_TOKENS": "16384",  #
                    "MAX_CONCURRENT_REQUESTS": "512",  #
                    "MODEL_ID": "/repository",  #
                    "QUANTIZE": "eetq",  #
                },  #
            })

See also: https://github.com/huggingface/text-embeddings-inference/issues/356

michael-newsrx-com changed discussion title from Inference Endpoint usage results in to Inference Endpoint usage results in Input Validation Error...less than 512...
michael-newsrx-com changed discussion status to closed

Sign up or log in to comment