Issues at high context lengths

by BloodOfTheRock - opened Nov 7, 2023

Nov 7, 2023

•

edited Nov 7, 2023

Was excited to see these new large context models released, but I can't seem to get coherent results out of them if attempting to use a large amount of input text. If you chat to it "normally" with very short queries and it responding with short responses, it seems to work fine, but if you try to utilize the large context window it fails to properly function, which seems to defeat the purpose of the model, unless I making some mistake.

For example, if I just ask it how it is doing, it responds normally like so:

However, if I paste the raw test of an entire new article online (still FAR under the 128k context length) and ask it for a short summary of the article, it responds which gibberish, like this:

Or sometimes it fails in other ways ,like responding with just a single character.

At any rate, I have been unable to utilize the large context window in any meaningful way, so I was wondering if I was perhaps doing something wrong? Just using it in ooba. The GGUF versions also behave in the exact same way.

BloodOfTheRock

Nov 7, 2023

I thought maybe something was wrong with my input text or maybe it wasn't properly sanitized, so I have tried with many different sources of input text and the behavior is the same. Here is an example with about 30k tokens as input:

LoneStriker

Owner Nov 7, 2023

•

edited Nov 7, 2023

I have not had much luck with the longer these Yarn context models yet either. From some descriptions from people on TheBloke's Discord, the new Amazon MistralLite seems to have great usable context length (Turboderp mentions he got it to go past 38K.) I suspect the Alpha or other parameters will need to be set properly with a long-context prompt to have coherent output.

BloodOfTheRock

Nov 7, 2023

•

edited Nov 7, 2023

Tested your MistralLite-5.0bpw-h6-exl2 at around 25k-30k tokens and it worked first test (same input that produced last image above)

sfingali

Nov 9, 2023

I have the same issues. It's producing nonsense.

LoneStriker

Owner Nov 10, 2023

I tested both the unquantized base model and the 8.0bpw version and they both behaved the same and was able to return non-gibberish inference. In ooba, I set the max token length to 32K. The only setting I changed was this one:

I have not tried to go to very high tokens though. Basic tests seem to work as expected.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment