Issues while deploying on AWS SageMaker with TGI

#7
by rajaswa-postman - opened

I've been trying to deploy codellama/CodeLlama-13b-Instruct-hf on AWS SageMaker with the TGI container for a while now. I am facing two issues in particular -

  1. The tokenizer class mismatch -
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
  1. Model loading error with TGI -
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 142, in serve_inner model = get_model( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 185, in get_model return FlashLlama( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 65, in __init__ model = FlashLlamaForCausalLM(config, weights) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 452, in __init__ self.model = FlashLlamaModel(config, weights) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 390, in __init__ [ File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 391, in <listcomp> FlashLlamaLayer( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 326, in __init__ self.self_attn = FlashLlamaAttention( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py", line 183, in __init__ self.rotary_emb = PositionRotaryEmbedding.load( File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 395, in load inv_freq = weights.get_tensor(f"{prefix}.inv_freq") File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 62, in get_tensor filename, tensor_name = self.get_filename(tensor_name) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 49, in get_filename raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist

Any idea about how these can be resolved?

I have tried using the latest transformers version - 4.33.1 as well.

Code Llama org

Same issue here ... Any help would be greatly appreciated

I already tried to pip install different transformer versions, but none of them was able to fix the problem.

!pip install git+https://github.com/huggingface/transformers.git@main
!pip install git+https://github.com/ArthurZucker/transformers.git@main
!pip install git+https://github.com/ArthurZucker/transformers.git@add-llama-code
Code Llama org

You should only need pip install git+https://github.com/huggingface/transformers.git@main my branch was just for developpement

Code Llama org

This warning is safe to ignore.

Both tokenizer are the same (for TGI purposes) as TGI doesn't use the codellama in code capabilities, you would need to send the preprompt yourself.
For the missing inv_freq codellama's weights didn't include those (essentially it's llamav2) and old TGI versions expected inv_freq to be present.

This should all be solved with the upcoming Sagemaker release of latest TGI.

Thanks for your reply, @Narsil ! Any information on when the upcoming Sagemaker release of the latest TGI will be available?

Code Llama org

Soon I hope, but I can't make any promises (it's not in our hands at this point)

Code Llama org

1.0.3 is now available on SageMaker.

Sign up or log in to comment