70B AWQ model?
#2
by
Teja-Gollapudi
- opened
Thanks for AWQ these models!
Do you plan on releasing a LLama-2-70B -chat-hf AWQ model soon?
Llama 2 70B (unlike 7B and 13B) uses a grouped-query attention mechanism (as opposed to multi-head attention). AWQ paper authors are working on adding support for GQA (link)
abhinavkulkarni
changed discussion status to
closed