Triton inference

#6
by SinanAkkoyun - opened

Hi! What is the fastest inference code available right now? Also, can this be used with NVIDIAs FasterTransformer inference code?

Technology Innovation Institute org

There is an upcoming integration in text-generation-inference that should be lightning fast: https://github.com/huggingface/text-generation-inference/pull/379 :)

FalconLLM changed discussion status to closed

Sign up or log in to comment