Why the inference speed so slow compare with same 7B parameters of Qwen?

#26
by lucasjin - opened

It's slower about 30% from my sense when chat on same GPU A100.

Sign up or log in to comment