faster inference

#10
by LukeJacob2023 - opened

The xtts v2 is good for the quatity of the voice, but the inference speed is a little slow. I am developing a speech translate software and want the tts can be inference less than 500 ms on a T4 GPU. So can you reference a half precise version or faster inference engine like onnx or ctranslate2?

Coqui.ai org

Inference speed first latency is ~0.2 seconds if you are using it with deepspeed (it is faster than onnx).
https://huggingface.co/spaces/coqui/xtts

Latency to first audio chunk: 212 milliseconds
Real-time factor (RTF): 0.25

You can squeeze may be %2-4 faster more with covering your inference code with torch.float16 autocasting (but that will slightly affect output quality , you may or may not notice depending on your need)

gorkemgoknar changed discussion status to closed

It would be greatly appreciated if you could provide the source code, I need deepspeed and half precise both.

Could you please provide onnx or trt to speed up the model inference time? It would be greatly appreciated

Sign up or log in to comment