Slow generation

#18

by tomer - opened Dec 10, 2023

Dec 10, 2023

Hi, I'm trying to generate with Qwen-7B and I think I may be missing something. The model is a lot slower than Llama-2-7b even though I'm using the recommended packages in the Qwen modeling code – I installed the latest stable flash_attn version, and also installed the flash_attn RMS norm implementation from source. Do you know what could be wrong?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment