Help regarding best quantization for below PC specification.

#15
by bikkikumarsha - opened

Since I don't want to spend hours download something that doesn't run. I am wondering which quantization would fit nicely in my system.
CPU: Intel core i7 13700k
Ram: 64GB
GPU: RTX 3090 24GB dedicated memory

Also is there any general rule of thumb that we should be following.

Hi,
I am about to upload the IQ-1 models, which are the smallest models. Those should fit without any issue.

Thanks for all the good work..

Hello,

I have roughly the same setup as you (RTX3090, 64GB RAM, Intel Core i9) and for my testing, I used TextGenerationWebui and the IQ1_M model loaded with 36 layers on the GPU.

Also, I had to limit the context size to 4096 and it seems that the max_new_tokens value has an impact on the quality of the results. I get better results with a max_new_tokens of 512 than with 1024.

However, it is very slow, I only get 1.6 tokens/s, so not really usable due to the delays.

I did a quick test last night with an RTX6000 which allows this model to be fully loaded in VRAM and I was getting around 25 tokens/s. In my opinion, this model requires more power than what a standard gaming PC has.

Thanks @tsalvoch for sharing your setup, that's a big help to others.

Sign up or log in to comment