huggigfacemodel deployment

#13

by arminnorouzi - opened May 15, 2023

Discussion

arminnorouzi

May 15, 2023

Did you test deploying on sagemaker using huggigfacemodel API similar to this notebook:

https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb

I cloned repo and uploaded model.tar.gz to s3. When deploying it, I got error that task need to be set:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Task couldn\u0027t be inferenced from ReplitLM.Inference Toolkit can only inference tasks from architectures ending with [\u0027TapasForQuestionAnswering\u0027, \u0027ForQuestionAnswering\u0027, \u0027ForTokenClassification\u0027, \u0027ForSequenceClassification\u0027, \u0027ForMultipleChoice\u0027, \u0027ForMaskedLM\u0027, \u0027ForCausalLM\u0027, \u0027ForConditionalGeneration\u0027, \u0027MTModel\u0027, \u0027EncoderDecoderModel\u0027, \u0027GPT2LMHeadModel\u0027, \u0027T5WithLMHeadModel\u0027].Use env HF_TASK to define your task."
}

After setting task by feeding HF_TASK as env variables, I got this error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/model requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code\u003dTrue to remove this error."
}

Here is my implementation:

hub = {
'HF_TASK':'text-generation'
}

huggingface_model = HuggingFaceModel(
model_data=s3_location, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17.0", # transformers version used
pytorch_version="1.10.2", # pytorch version used
py_version='py38', # python version used
env=hub
)

deploy the endpoint endpoint

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.8xlarge",
)

Would you please help me.

JoselinSushma

May 23, 2023

JoselinSushma

May 24, 2023

@arminnorouzi Can you please tell how did you download the model and upload to s3. i am using sagemaker notebook to do that but running out of space even if larger instance is used.

arminnorouzi

May 24, 2023

@JoselinSushma I used relatively large instances as I was trying a larger model (starcoder). I tried this, and it worked: ml.g4dn.2xlarge

JoselinSushma

May 25, 2023

•

edited May 25, 2023

@arminnorouzi , Thank you,
I tried adding custom function for model_fn and predict_fn. Reference : https://huggingface.co/docs/sagemaker/inference
model_fn to load model with trust_remote=True as given in model card . It worked.

However Invocation in sagemaker takes more than 60sec which timesout if a longer code has to be generated.

arminnorouzi

May 25, 2023

@JoselinSushma is it possible to share the custom function you wrote here? Also, for deployment, did you use these versions?

transformers_version="4.17.0", # transformers version used
pytorch_version="1.10.2", # pytorch version used
py_version='py38', # python version used

JoselinSushma

May 26, 2023

def model_fn(model_dir):
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True)
code_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
return code_generator

JoselinSushma

May 26, 2023

This comment has been hidden

pirroh

Replit org Jun 5, 2023

Thanks @JoselinSushma for providing support on this issue!
Closing for now, as you seem to have made it work correctly 🙌

pirroh changed discussion status to closed Jun 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment