Error using model whehn deployed on inference endpoints

#7
by gill13 - opened

I fine-tuned the falcon 7b model on google colab. As i wanted to use it in an application and created an endpoint, I tried deplying it on sagemaker. That gave me an error where it couldnt recognise the word falcon. I tried multiple ways but couldnt succeed.
I thought to move on to inference endpoints for deployment. I started with deployoing the initial model directly to check, but I am getting an error there which is

module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

i dont know how to solve this, can someone please guide

Hello, I have'nt experience with SageMaker. But about 'scaled_dot_product_attention', is a pytorch 2.0 feature

I know that AWS have support Text Generation Inference (TGI), you can try it using RunPod too

https://vilsonrodrigues.medium.com/serving-falcon-models-with-text-generation-inference-tgi-5f32005c663b

but how can i resolve pytorch 2.0 feature issue, i am directly uploading it without any changes just to check deployment. I have no where to interact with pytorch.

I would try to talk to the people at HuggingFace if it is their service

Which platform would u suggest for deployment ?
My pc is not powerful enough for this model.

For tests try it on google colab/kaggle kernels

for production, cloud

sorry for delay

Sign up or log in to comment