Llama-2-70b Model: Challenges with Long Token Sequences

#5
by zuhashaik - opened

As the open-source Llama-2-70b model gains popularity within the community, questions arise about its performance on longer token sequences, potentially exceeding 2500 tokens. In my case, it seems to struggle after 500 tokens. Specifically, I'm referring to the Llama-2-70b model.

May be after finetuning with the data (simulated with gpt-3.5 or 4).what does the community thinks about working on 2500 tokens and higher or any suggetions on some other models?

Sign up or log in to comment