Error when starting endpoint in both Huggingface and Sagemaker: RuntimeError: weight model.embed_tokens.weight does not exist

by josh103 - opened Jan 31, 2024

Jan 31, 2024

I'm consistently getting the following error message when setting up an endpoint according to instructions in both Huggingface as well as Sagemaker.

Error: ```
Server message:Endpoint failed to start. asyncio/runners.py", line 44, in run\n return loop.run_until_complete(main)\n\n File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n return future.result()\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 196, in serve_inner\n model = get_model(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 233, in get_model\n return FlashLlama(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py", line 69, in __init__\n model = FlashLlamaForCausalLM(config, weights)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 412, in __init__\n self.model = FlashLlamaModel(config, weights)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 346, in __init__\n self.embed_tokens = TensorParallelEmbedding(\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/layers.py", line 502, in __init__\n weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 88, in get_partial_sharded\n filename, tensor_name = self.get_filename(tensor_name)\n\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/weights.py", line 64, in get_filename\n raise RuntimeError(f"weight {tensor_name} does not exist")\n\nRuntimeError: weight model.embed_tokens.weight does not exist\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]} {"timestamp":"2024-01-30T21:18:25.376761Z","level":"INFO","fields":{"message":"Shard terminated"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]} Error: ShardCannotStart


Python code used to initialize endpoint in Sagemaker:

hub = {
'HF_MODEL_ID':'defog/sqlcoder-70b-alpha',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role='',
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.8xlarge",
container_startup_health_check_timeout=300,
)


Does anyone know how I can get this hosted in either Sagemaker or Huggingface?

rishdotblog

Defog.ai org Jan 31, 2024

Thanks for reporting – looking into this.

rishdotblog

Defog.ai org Jan 31, 2024

Hi there, we discovered a bizzare bug where the model's lm_head.weight was not uploaded to HF in the upload process. This is causing many integrations to break, and the model uploaded here is producing gibberish results.

Fix coming soon – hopefully in the next hour

rishdotblog

Defog.ai org Jan 31, 2024

Fixed with a reupload of the model weights! Apologies for the issue. Please let me know if you still run into problems

rishdotblog changed discussion status to closed Jan 31, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment