vilsonrodrigues/falcon-7b-instruct-sharded · Error using model whehn deployed on inference endpoints

Sep 14, 2023

I fine-tuned the falcon 7b model on google colab. As i wanted to use it in an application and created an endpoint, I tried deplying it on sagemaker. That gave me an error where it couldnt recognise the word falcon. I tried multiple ways but couldnt succeed.
I thought to move on to inference endpoints for deployment. I started with deployoing the initial model directly to check, but I am getting an error there which is

module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'

i dont know how to solve this, can someone please guide

vilsonrodrigues

Owner Sep 14, 2023

Hello, I have'nt experience with SageMaker. But about 'scaled_dot_product_attention', is a pytorch 2.0 feature

I know that AWS have support Text Generation Inference (TGI), you can try it using RunPod too

https://vilsonrodrigues.medium.com/serving-falcon-models-with-text-generation-inference-tgi-5f32005c663b

gill13

Sep 14, 2023

but how can i resolve pytorch 2.0 feature issue, i am directly uploading it without any changes just to check deployment. I have no where to interact with pytorch.

vilsonrodrigues

Owner Sep 14, 2023

I would try to talk to the people at HuggingFace if it is their service

gill13

Sep 14, 2023

Which platform would u suggest for deployment ?
My pc is not powerful enough for this model.

vilsonrodrigues

Owner Oct 8, 2023

For tests try it on google colab/kaggle kernels

for production, cloud

sorry for delay