Slow generation

#18
by tomer - opened

Hi, I'm trying to generate with Qwen-7B and I think I may be missing something. The model is a lot slower than Llama-2-7b even though I'm using the recommended packages in the Qwen modeling code – I installed the latest stable flash_attn version, and also installed the flash_attn RMS norm implementation from source. Do you know what could be wrong?

Sign up or log in to comment