Error in deploy

#1
by giuliogalvan - opened

I am trying to deploy (either through sagemaker or managed endpoints) this model to make extensive tests but I ran in the following problem.

This is a log extract from AWS sagemaker after the invocation of huggingface_model.deploy()

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.

The tokenizer class you load from this checkpoint is 'XgenTokenizer'. 

The class this function is called from is 'LlamaTokenizer'.

Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
  sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 67, in serve
  server.serve(model_id, revision, sharded, quantize, trust_remote_code, uds_path)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 155, in serve
  asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
  return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
  return future.result()


Error: ShardCannotStart
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
  model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
  return llama_cls(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 44, in __init__
  tokenizer = LlamaTokenizer.from_pretrained(
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
  return cls._from_pretrained(
File "/usr/src/transformers/src/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
  tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/src/transformers/src/transformers/models/llama/tokenization_llama.py", line 96, in __init__
  self.sp_model.Load(vocab_file)
File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
  return self.LoadFromFile(model_file)
File "/opt/conda/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
  return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

Any help will be very appreciated :)

Sign up or log in to comment