Is this the best size for 2x 4090?

#1
by dnhkng - opened

I see a bunch of variants of various compression ratios. I like to know which is the best size to use. That basically means using up all the available GPU ram on 2x 4090s, and having enough for aa decent sized context.

This size is a good option and one I run often myself with 2x 4090s. Some folks have had good luck with the 5.0bpw models as well. It really depends on what else you're running on your GPU (desktop vs. headless Linux server). Anything from 4.0 and up should not lose too much in perplexity measurements from the base fp16 model.

hey I have issues w/ this model not remembering previous context while in instruction mode. For instance, I ask it a question, I get a response, and I want to follow up w/ a question related to the previous response, and it comes up with a completely random response that had nothing to do w/ the previous context. ie, I ask it to generate code. I ask it to modify the code, and it will spit out code with mistakes. So I ask it to correct the mistakes, and it will pull up some other random code that is completely unrelated to what I originally asked.
I have a 4.85bpw gguf version of this same model and it doesn't do that. Any ideas what could be going on? FWIW I'm using n_ctx 8192.

To close this out:
As discussed in Discord, the issue here is somewhere inside of ooba itself. The textgen-webui is simply not passing the context back to the model for processing, so the model cannot act on that information.

Yeah you can close this one. The problem was that my prompt needed to be more specific. It's not an ooba issue specifically, though that was part of it. It was also a synthia issue. When I was more specific in my prompt, ie "can you re-write the previous program"? it then picked up the context correctly and modified the original program.
I haven't had to be that specific w/ any other models, but that's no biggie. It's something I'll make note of in the future.
You can see the chat history in discord where we tested it. Others had the same problem I did until they were very deliberate w/ their prompting.

LoneStriker changed discussion status to closed

Sign up or log in to comment