Triton inference
#6
by
SinanAkkoyun
- opened
Hi! What is the fastest inference code available right now? Also, can this be used with NVIDIAs FasterTransformer inference code?
There is an upcoming integration in text-generation-inference
that should be lightning fast: https://github.com/huggingface/text-generation-inference/pull/379 :)
FalconLLM
changed discussion status to
closed