Input context length issue

#3
by HR1777 - opened

@mlabonne , the issue i am mentioning now is global with every single model i have used so far (even 128K models) including NeuralBeagle14-7B. As you know this context length of NeuralBeagle14-7B is supposed to be 8K. But when i want to do some operations on my desired texts, for example with 4 pages of text which is less than 4096 tokens, i have to split to text into several parts to get the desired output.
For instance if i want to ask to model to extend the input text, if i give the 4 pages on text at once to the model, it gives me 2 pages as output, but if i split that 4 pages to 8 parts, the output will become 6 pages when i combine them together.
I am using Oobabooga as interface but i have the same issue with Koboldcpp too even though i change the settings like alpha_value, n_batch and other parameters to increase the capability of model to process longer inputs.
Do you know happen to know any solution for this issue that enables me to get almost the same results without having to split the input into multiple parts manually?

Sorry @HR1777 I haven't experimented with this long context retrieval task. I'm surprised that even 128k models can't do it. Have you tried Mixtral or 200k models?

I didn't try Mixtral but i have tested some 200k models, as well as 64K or 128K models in addition to more than 40 32k models that i tried and all of them failed. I am sure Mixtral will fail too. I don't know what the solution is. I hoped you could help me.

I have been using this model with context length of 25'600 by setting the rope_base to 32'000. I obtained these settings through trial and error after talking with others about how they configure RoPE, most of whom only attempt to double the context size. I found that to get 16K I could set the rope base to 50'000 and get excellent results. The original 8K is with rope base at 100'000. Making only an educated guess I reasoned that the configurations that might work will all have ctx-size*rope-base = 819'200'000. I tried for ctx-size of 32K using rope-base at 25600 but the perplexity was awful. The settings for ctx-size of 25'600 gave perpexity values of about 14.0, which is quite a bit higher than the ~6.5 with the default 8K context, but with actual testing the model seems to perform well.

I am doing this testing using MemGPT and long conversations in which the context is allowed to grow to near the limit and then old conversation is purged. I have had conversations that were long enough to fill & then purge the full 25'600 context window two times without any evidence of poor performance when the context is nearly full.

This is the best 7B model I have found for use with MemGPT. Thank you @mlabonne for your excellent work!

@jimlloyd , thank you for sharing your experience, i just would like to know do you happen to know the equivalent of rope-base in webui Oobabooga? is it the same as rope_freq_base?

Sorry @HR1777 , I was sloppy to not use the exact parameter names. In llama.cpp the exact parameter name is --rope-freq-base. I don't use oobabooga webui but I would expect that rope_freq_base is the same thing.

@jimlloyd , thank you so much for your information. I tested your method and it seems much better. If its possible i would like to ask you a favor.
Is it possible for you to put a full article with more than 1500 words as input and ask the model to expand and extend it. then adjust the settings to get output more than 1500 words without forgetting the details on the input.
I have tried it so many times with different models, but every time the output is much less than 1000 words. I would like to get at least output length equal ( or greater) than the input, but i failed every time. Thank you so much for your help.

@HR1777 Sorry for the 3 day delay in responding (I don't check HuggingFace every day). It might take me a day or two of experimenting before I reply again but I wanted you to know that I am investigating.

I think this should help.

First, I downloaded a text from Project Gutenberg: https://www.gutenberg.org/ebooks/22430
I selected a portion of the text of approximately 1000 words -- shorter than you requested but I think you'll agree that it is was sufficient to demonstrate the techniques.

I chose Project Gutenberg because it was an easy way to get essay-like material but it seems likely that NeuralBeagle was trained (or even fine-tuned??) on a dataset that included the text I chose, which might have made the task somewhat easier.

I prefaced the text a variety of prompts. This one seemed to produce the longest output:

"Below is a portion of an essay. Write a 3000 word chapter expanding on the essay. You are granted full creative license -- your response need not be factual or completely faithful to the original essay, but it should be coherent, minimally repetitive, and a least partly faithful. You are encouraged to explore the different viewpoints that pertain to the material.."

I have not made any effort to study prompt engineering, so it wouldn't surprise me if the prompt can be improved.

This is the output of the wc utility for longest output:
525 3758 23394 output6.txt

I expect you are familiar with wc but for clarity, that output means 525 lines, 3758 words, 23394 characters.

I used llama.cpp's main program as follows:

./main -ngl 35 -m models/neuralbeagle14-7b.Q8_0.gguf  -c 25600 --rope-freq-base 32000 --temp 1.5 --repeat_penalty 1.1 -n -2 --no-penalize-nl   --repeat-last-n -1 --file ~/expand-essay-prompt.txt

Notes on this prompt:

  1. The -n -2 means: number of tokens to predict: infinity
  2. The --repeat-last-n -1 means last n tokens to consider for penalize: ctx_size
  3. The --no-penalize-nl means no penalty for newlines.
  4. The --temp 1.5 is a high temperature setting. I'm not sure how useful this was.

May I ask what your goal is for this? Are you planning to generate data to fine tune for longer context windows?

@jimlloyd , thank you so much. I really appreciate it. I'm really glad to see the results of your experiment and i am going to try it today and i will update you about my results too. I just hope i get the same results with text generation webui too as i don't have llama.cpp installed on my computer. Thank you so much again.

@HR1777 happy to help. My small contribution to your efforts that I hope to continue benefiting from!

Sign up or log in to comment