Llama 2 fails with context length >> 2000

#2
by Joemgu - opened

I have used both the 7b and 13b chat-hf models in different computation types (int8, int8_float16, int8_bfloat16), however I always run into the same issue.
Inference is run with 3.17.1 and transformers 4.31.0.

When using the correct prompt format (I triple-checked) ex. for summarization and the whole input prompt is longer than 2048 tokens the generation fails and produces weird hallucinations, returning a long chain of special characters like #, ", etc. With input prompts shorter than 2000 tokens, but output can be 1024, the model correctly performs tasks. However if I input > 3000 tokens, no matter the set generation size, the output will always return gibberish.

It seems like as long both input sequence length and generated tokens combined are within 2048 tokens, everything works fine, but as soon as we start exceeding these 2048 tokens, the model starts to break down. I suspect (or at least it feels like) the model has only an effective context length of 2048. I can also notice the model starting to forget the initial instruction while generating tokens and exceeding this invisible 2048 token wall.

Is there a way to fix this behavior and make use of the full 4096 context length?

Hi @Joemgu ,
Thanks for your issue post. I think, I may also have seen some issues for token inputs > 2048 tokens for ctranslate2 models like falcon - where i suspected that is because of the model length.
I just converted the model with https://github.com/michaelfeil/hf-hub-ctranslate2/blob/main/conversion_utils/convert.py and uploaded it here.

Would you be so kind an post your issue here? https://github.com/OpenNMT/CTranslate2/issues

Thanks for the quick answer, just posted the issue in case you are interesting in following it.

Edit: Just noticed you commented on it, thank you!

For anyone wondering, the issue was due to having the wrong cuda version installed, as the current supported version is 11. Than Michael for your help!

Joemgu changed discussion status to closed

Sign up or log in to comment