protoc error while deploying the model into SM instance

#1
by LorenzoCevolaniAXA - opened

Hello,

I wanted to try this model into one of our SageMaker instances.
I tried the code to deploy the model you present in the deploy button but the instance proposed ("ml.g5.2xlarge") is too small so it get a memory error while converting the pytorch weights to safetensors, the problem is solved when using the following instance: "ml.g5.8xlarge".
I am using the huggingface llm image version 0.8.2 (latest available).
After this I encountered another problem which I do not know how to solve it.
The problem is the following:

 Shard 0 failed to start:
Traceback (most recent call last):
Error: ShardCannotStart
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
    server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 289, in get_model
    return CausalLM(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 469, in __init__
    tokenizer = AutoTokenizer.from_pretrained(
  File "/usr/src/transformers/src/transformers/models/auto/tokenization_auto.py", line 692, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
  File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
    super().__init__(
  File "/usr/src/transformers/src/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 1303, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/usr/src/transformers/src/transformers/convert_slow_tokenizer.py", line 445, in __init__
    from .utils import sentencepiece_model_pb2 as model_pb2
  File "/usr/src/transformers/src/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
    _descriptor.EnumValueDescriptor(
  File "/opt/conda/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

I tried to set the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python in the environmental variable and the error I get is the following:

RuntimeError: Llama is supposed to be a BPE model!

Can you give some hints about what is not right here?
Is it an instance problem?
Is it a package problem?

Thanks in advance

Machine Translation Team at Alibaba DAMO Academy org

Actually, I am not very familiar with SageMaker, so I am not entirely clear about the specific reasons. However, you can try cloning the latest model files and installing the latest version of Transformers (v4.31.0). If any problems persist, we can further investigate the reasons together.

Sign up or log in to comment