cognitivecomputations/Wizard-Vicuna-13B-Uncensored · Any plans on expanding the context length with landmark attention?

Jun 10, 2023

•

edited Jun 10, 2023

Hi ehartford,

First of all, thanks so much for bringing this model to us! I think it is by far the best model suited for all my tasks.
One of the problem that I experienced, and I think a lot of people also are experiencing is the 2048 token length limit.
Before seeing this repo, I thought it was not possible to expand llama models' context lengths until I found this:
https://github.com/epfml/landmark-attention

As I have been looking through the wizard vicuna dataset, I've found that there are prompts that are way over 2048 tokens (maximum I found was 8225), although my method of calculating the token size could be wrong (by using llama tokenizer to tokenize the combination of history and prompt), I think we can still come to a conclusion that some of the data from wizard vicuna were truncated due to token length limitations.

Do you think it was worth a shot for trying landmark attention on the wizard vicuna 13b model to see if we can expand its context length?
Thanks!

ehartford

Cognitive Computations org Jun 10, 2023

I'll look into it, sounds interesting!

Kippykip

Jun 25, 2023

I didn't even know that was possible, but that would be amazing!
Whenever I make quirky charcters with example context / past chats, it always eats up quite a decent amount of the 2048 tokens. Extending that would be a dream come true!

itachiluan

Jul 1, 2023

I didn't even know that was possible, but that would be amazing!
Whenever I make quirky charcters with example context / past chats, it always eats up quite a decent amount of the 2048 tokens. Extending that would be a dream come true!

Hi,

If you check TheBloke’s page, he has published many models that now merged with superHOT 8k Lora that extends the context length to 8k+, worth giving it a go!
The slight problem for me was that the superHOT Lora’s didn’t train with the wizard vicuña dataset so the wizard vicuña merged model has a slight less accuracy for me (for Chinese generation), but definitely worth trying !