'use_cache: false' reduces tokens/sec significantly

#10
by Astris - opened

I saw a 3x reduction in tokens/sec with cache being disabled,compared to enabled. I don't know why it was disabled, but considering the difference it might be beneficial to have it enabled by default. I used the huggingface loader in text-generation-webui, and ran the model on a 3090.

The config.json file has use_cache: True already set. When I loaded this up in textgen, it stayed set to true. Is there anything special about your setup?

To clarify, I only fixed this yesterday. (I kept forgetting)

Oh! I should have looked at my local copy when I commented, I see that my cache was set to false. Got a nice little speed increase, not 3x, but from 7it/s to 11it/s on a 4090. Thanks metaprotium, wouldn't have known unless you posted. And thanks for the model Gryphe, it's seriously awesome.

Astris changed discussion status to closed

Sign up or log in to comment