Speed up inference time locally (Asteroid)

#1
by huks - opened

Hi, when I use the inference on hugging face it takes only about 0.5s for a short audio (3-5s) and it states that the inference was performed on CPU.

Running it locally on my CPU or even on the GPU the inference takes quite longer (+1.8s even with ONNX runtime).

Do you have any idea how hugging face achieves this result on CPU? Any possibility to tune it locally?

Thanks in advance

Hi,

I guess it just depends on your CPU speed. I just tested locally on my laptop for audio a wav file sampled at 16kHz and 4 seconds long, I have an average of 0.75s over 100 try.
If you want to speed up the inference locally here are some steps you can take :

  • Disable gradient computation for inference using "with torch.no_grad():"
  • Use JIT to trace your model and run inference on the traced model

Sign up or log in to comment