multiquery attention

#46
by ZhongYingMatrix - opened

Hi, thank you for your excellent work. I noticed the implementation of multiquery attention in https://huggingface.co/blog/falcon, but I am unable to locate it in the source code. Can you please provide me with guidance on how to find it?

Technology Innovation Institute org

All model-related code is in the modelling_RW.py file.

FalconLLM changed discussion status to closed

Sign up or log in to comment