Sagemaker deployment

by PrajwalM

I tried deploying the model using bitsandbytes-nf4 quantization technique on g5.4xlarge instance,

while invoking the endpoint with the following payload
payload = {
"inputs": prompt,
"parameters": {
"top_p": tp,
"temperature": tmp,
"top_k": 50,
"max_new_tokens": 1024,
"repetition_penalty": 1.03,
"stop": [""]
its giving the error
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See in account 346347345 for more information.

Can you please provide any documentation or code repo where I get the deployment code?

Thank you

