Much slower than StarCoder?

#4
by jiang719 - opened

Based on my experience, WizardCoder takes much longer time (at least two times longer) to decode the same sequence than StarCoder.
I thought their is no architecture changes.
Is their any? Otherwise, what's the possible reason for much slower inference?

WizardLM Team org

I think the possible reason is that WizardCoder tends to generate a much longer response than StarCoder.

Same here, and WizardCoder uses more VRAM.

It could be due to use_cache being disabled. Is there any specific reason to disable it?

Sign up or log in to comment