Slow prompt processing

by OrangeApples - opened Feb 18

Feb 18

Hi @dranger003 ! Thanks for the quants. Is the prompt processing of the IQ2_X2 supposed to be this slow (9.83T/s) even when the model is fully offloaded? I'm using the latest Nexesenex fork of KCPP.

dranger003

Owner Feb 18

@OrangeApples I guess it really depends on your GPU but I think that 10 t/s should be around what I would expert on a 3090. Also note IQ2_XXS is quite faster than IQ2_XS and quality isn't degraded from IQ2_XS (at least not that I could notice).

OrangeApples

Feb 19

•

edited Feb 19

Thanks! Yes, I'm using a 3090. Will give IQ2_XXS a shot as well.

Edit: Turns out my prompt processing was spilling over to the system ram. After reducing the context from 12k to 10k, I got 227T/s for PP.

OrangeApples changed discussion status to closed Feb 19

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment