Error when starting endpoint in both Huggingface and Sagemaker: RuntimeError: weight model.embed_tokens.weight does not exist

#3
by josh103 - opened

I'm consistently getting the following error message when setting up an endpoint according to instructions in both Huggingface as well as Sagemaker.

Error: ```
Server message:Endpoint failed to start. asyncio/runners.py", line 44, in run\n return loop.run_until_complete(main)\n\n File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n return future.result()\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner\n model = get_model(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 233, in get_model\n return FlashLlama(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 69, in __init__\n model = FlashLlamaForCausalLM(config, weights)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 412, in __init__\n self.model = FlashLlamaModel(config, weights)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 346, in __init__\n self.embed_tokens = TensorParallelEmbedding(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 502, in __init__\n weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 88, in get_partial_sharded\n filename, tensor_name = self.get_filename(tensor_name)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 64, in get_filename\n raise RuntimeError(f"weight {tensor_name} does not exist")\n\nRuntimeError: weight model.embed_tokens.weight does not exist\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]} {"timestamp":"2024-01-30T21:18:25.376761Z","level":"INFO","fields":{"message":"Shard terminated"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]} Error: ShardCannotStart


Python code used to initialize endpoint in Sagemaker:

hub = {
'HF_MODEL_ID':'defog/sqlcoder-70b-alpha',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role='',
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.8xlarge",
container_startup_health_check_timeout=300,
)


Does anyone know how I can get this hosted in either Sagemaker or Huggingface?
Defog.ai org

Thanks for reporting – looking into this.

Defog.ai org

Hi there, we discovered a bizzare bug where the model's lm_head.weight was not uploaded to HF in the upload process. This is causing many integrations to break, and the model uploaded here is producing gibberish results.

Fix coming soon – hopefully in the next hour

Defog.ai org

Fixed with a reupload of the model weights! Apologies for the issue. Please let me know if you still run into problems

rishdotblog changed discussion status to closed

Sign up or log in to comment