RuntimeError: weight model.embed_tokens.weight does not exist
Hi,
I'm trying to deploy with SageMaker SDK and I get this error:
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 252, in get_model
return FlashMistral(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_mistral.py", line 321, in __init__
model = FlashMistralForCausalLM(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 486, in __init__
self.model = MistralModel(config, weights)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 418, in __init__
self.embed_tokens = TensorParallelEmbedding(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 482, in __init__
weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 78, in get_partial_sharded
filename, tensor_name = self.get_filename(tensor_name)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 53, in get_filename
raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.embed_tokens.weight does not exist
I'm using this configuration:
"e5-mistral-7b-instruct": {
"model_type": "huggingface",
"model_id": "intfloat/e5-mistral-7b-instruct",
"instance_type": "ml.g5.2xlarge",
"num_gpus": 1,
"image_uri": "763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04",
"model_hub": {
"MODEL_SERVER_TIMEOUT": "120",
"MAX_INPUT_LENGTH": "2048",
"MAX_TOTAL_TOKENS": "4096"
}
}
The image_uri
corresponds to the output of get_huggingface_llm_image_uri("huggingface",version="1.1.0")
Any idea?
@ivankeller
It's an embedding model without LM head, please load with AutoModel.from_pretrained
instead of AutoModelForCausalLM.from_pretrained
.
Also, this model does not have text generation capability.
Thank you
@intfloat
, I'm trying now with packaging the model like described here:
https://www.philschmid.de/custom-inference-huggingface-sagemaker
I know it does not have text generation capability. I want the embedding vector corresponding to a text.
By the way, why does the embedding depends on a specific task? Is it possible just to have the embedding of a text itself, regardless of the task?
I still get the same error ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
Following the ref above (https://www.philschmid.de/custom-inference-huggingface-sagemaker) I use .from_pretrained
:
def model_fn(model_dir):
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModel.from_pretrained(model_dir)
return model, tokenizer
The model and tokenizer are packages into `model.tar.gz on located on S3.
Hi
@ivankeller
I face the same issue. Did you fix this issue or do you have any updates for this?
what I tried:
- Update ‘transformers’ with the latest ‘pip install git+https://github.com/huggingface/transformers’
- use the LlamaTokenizer instead of the AutoTokenizer
- check the tokenizer_config.json and make sure that "tokenizer_class" is "LlamaTokenizer"
Hi
@momotake
I could not fix it and I gave up using another embeddings model. I tried the same fixes than you.
Thanks
@ivankeller
.
I added " from transformers import LlamaTokenizer" and then it is working now. AutoTokenizer automatically retrieve the best Tokenizer, but maybe we have to pre-install the best tokenizer chosen by AutoTokenizer :(
If you still want to use this embedding model, I hope it will solve your problem too.