RuntimeError: weight model.embed_tokens.weight does not exist

#12

by ivankeller - opened Jan 9

Discussion

ivankeller

Jan 9

•

edited Jan 9

Hi,
I'm trying to deploy with SageMaker SDK and I get this error:

> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 252, in get_model
    return FlashMistral(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_mistral.py", line 321, in __init__
    model = FlashMistralForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 486, in __init__
    self.model = MistralModel(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 418, in __init__
    self.embed_tokens = TensorParallelEmbedding(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 482, in __init__
    weight = weights.get_partial_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 78, in get_partial_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 53, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight model.embed_tokens.weight does not exist

I'm using this configuration:

    "e5-mistral-7b-instruct": {
        "model_type": "huggingface",
        "model_id": "intfloat/e5-mistral-7b-instruct",
        "instance_type": "ml.g5.2xlarge",
        "num_gpus": 1,
        "image_uri": "763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04",
        "model_hub": {
            "MODEL_SERVER_TIMEOUT": "120",
            "MAX_INPUT_LENGTH": "2048",
            "MAX_TOTAL_TOKENS": "4096"
        }
    }

The image_uri corresponds to the output of get_huggingface_llm_image_uri("huggingface",version="1.1.0")

Any idea?

intfloat

Owner Jan 10

•

edited Jan 10

@ivankeller It's an embedding model without LM head, please load with AutoModel.from_pretrained instead of AutoModelForCausalLM.from_pretrained.

Also, this model does not have text generation capability.

ivankeller

Jan 10

•

edited Jan 10

Thank you @intfloat , I'm trying now with packaging the model like described here:
https://www.philschmid.de/custom-inference-huggingface-sagemaker
I know it does not have text generation capability. I want the embedding vector corresponding to a text.
By the way, why does the embedding depends on a specific task? Is it possible just to have the embedding of a text itself, regardless of the task?

ivankeller

Jan 10

I still get the same error ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
Following the ref above (https://www.philschmid.de/custom-inference-huggingface-sagemaker) I use .from_pretrained :

def model_fn(model_dir):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModel.from_pretrained(model_dir)
    return model, tokenizer

The model and tokenizer are packages into `model.tar.gz on located on S3.

momotake

Apr 16

Hi @ivankeller
I face the same issue. Did you fix this issue or do you have any updates for this?
what I tried:

Update ‘transformers’ with the latest ‘pip install git+https://github.com/huggingface/transformers’
use the LlamaTokenizer instead of the AutoTokenizer
check the tokenizer_config.json and make sure that "tokenizer_class" is "LlamaTokenizer"

ivankeller

Apr 16

Hi @momotake
I could not fix it and I gave up using another embeddings model. I tried the same fixes than you.

momotake

Apr 17

•

edited Apr 17

Thanks @ivankeller .
I added " from transformers import LlamaTokenizer" and then it is working now. AutoTokenizer automatically retrieve the best Tokenizer, but maybe we have to pre-install the best tokenizer chosen by AutoTokenizer :(
If you still want to use this embedding model, I hope it will solve your problem too.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment