Llama-2-70b Model: Challenges with Long Token Sequences

by zuhashaik - opened Oct 30, 2023

Oct 30, 2023

As the open-source Llama-2-70b model gains popularity within the community, questions arise about its performance on longer token sequences, potentially exceeding 2500 tokens. In my case, it seems to struggle after 500 tokens. Specifically, I'm referring to the Llama-2-70b model.

May be after finetuning with the data (simulated with gpt-3.5 or 4).what does the community thinks about working on 2500 tokens and higher or any suggetions on some other models?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment