inference not working in any enviroment

#5
by LoreVitCon - opened

Tryed both instruct and instruct-cuda, in both cases i get some errors (like missing triton, but cant install it) or cuda version that requieres a GPU

Microsoft org

Hi ! Thank you for your interest in phi-3 !
For the small model, because we use a custom triton kernel for block-sparse attention, there is a dependency on having a GPU as well as on Triton.
There is active work going on for enabling llama.cpp support for this (see this issue).

bapatra changed discussion status to closed

Sign up or log in to comment