Please help me with this issue

#2
by Hoioi - opened

I have tested the q4_k_m version and it works perfectly fine. Thank you so much for this great model.
I only have an issue not only with your model but with all models (more than 150 models) i have tested so far but as you are a helpful and knowledge person in AI, I asked you this issue and i hope that you guide me through this.

The issue is that when I put a full article as input for example an article with 1500 words and I want the AI model to edit it or rewrite it somehow, the output will become so much smaller than the input for example the output will be 900 words and it seems that the AI model forgets some parts of the input.
I have tested many 32k models or even 200k models and i tested it with different tools like oobabooga, lmstudio, gpt4all, koboldcpp and changed their settings to overcome this issue, but the issue remains the same.

I would like to ask you to help me regarding this issue and tell me how can I put a full article as input and expect a model to rewrite it for me as I want without losing any details of it and without making in making it much smaller.

the only way I I know so far is to split the article into i.e 10 parts manually and put each part separately in input, and get output and finally combine the whole outputs together to make a full article with a tedious job.
I would be so grateful if you could help me regarding this problem.

The new quants are available here https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-GGUF.
There is also now a GGUF chat in spaces for the model: https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF.

To fix the issue regarding you problem the answer is context. Run the llama.cpp server and make sure n_ctx is set to something like 32xxx by default

Thanks for the new model. What's different in this model compared to the previous one?

I have the same issue. I set n_ctx to the maximum which is 32768, but the issue persists. I don't know how to put a full article in the input and ask the model to expand it without loosing details of it. Any other solution?

This version is better at math, has better prompt alignment, and it fixes a chat template error from the previous merge.

As for the context issue, language modeling is fairly experimental in general even though its become so common place. Aside from implementing a custom logits processor and controlling how abstractive/extractive the generation is, I don't see another way to control whether the output has the exact same tokens as the input.

Short answer: you very well may have difficulty retaining the information you want in your article. I would experiment with prompts and try shorter articles first. @HR1777

Also I would try the new quants in the link above. You will likely have a better experience

@macadeliccc , thanks for the improvement of the model and your explanations. When i give the model i.e. 2 paragraphs, it works pretty well, but when i put a full article which is i.e. 1700 words, it can't handle it properly. I even put a 1700 words article in here and asked it to expand it https://huggingface.co/spaces/macadeliccc/laser-dolphin-mixtral-chat-GGUF and although it took more than 20 minutes. the output was less than 600 words!
I believe its a big issue with all models i have tested so far. I hope you can find a solution or develop a model for better handling long inputs.

@macadeliccc The new quants link is dead. It's also referenced in this repo's README.

@glibg10b thanks for the heads up. I have updated the links.

@macadeliccc Now the README links to itself. I guess this repo isn't deprecated anymore :)

@glibg10b fair point. no need for that distinction. I have just removed that link entirely.

That must have been left over from when I was working on version 2.

macadeliccc changed discussion status to closed

Sign up or log in to comment