Failed to create inference endpoint

#1
by brekk - opened

Issue:
I cannot start inference endpoint, the log says:
2023/12/07 10:53:21 ~ Error: ShardCannotStart
2023/12/07 10:53:21 ~ {"timestamp":"2023-12-07T01:53:21.369939Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
2023/12/07 10:53:21 ~ {"timestamp":"2023-12-07T01:53:21.369962Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}

Steps for reproduce:
Deploy > Inference Endpoint > Select A10G AWS instance

Is there a way to use inference endpoint with this lora model?

Thanks in advance!

H4 Alignment Handbook org
edited Dec 7, 2023

Hi @brekk
I am not sure the inference endpoints support Lora, you should consider use the merged model (which I believe is: https://huggingface.co/alignment-handbook/zephyr-7b-sft-full right @lewtun ?) - if not, you can merge the model yourself, please have a look at: https://huggingface.co/docs/peft/v0.7.0/en/package_reference/lora#peft.LoraModel.merge_and_unload but to merge the lora model you can just:

from peft import AutoPeftModelForCausalLM

merged_model_id = YOUR_NEW_MODEL_ID

model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id)
merged_model = model.merge_and_unload()
merged_model.push_to_hub(YOUR_NEW_MODEL_ID)

Thank you for the reply @ybelkada .
I will give it a try!

Sign up or log in to comment