Add ONNX CPUExecutionProvider since our Benchmark showed it can speed inference up by 40%~60% (depending on the types of CPUs).

CUDAExecutionProvider as a back-end also gets more promising. We can support it once a couple of things (1,2) are fully fixed in the upstreams.

Results from my local test (CPUExecutionProvider takes 40% less time compare to native PyTorch inference):
Screenshot from 2022-10-01 22-25-55.png

chuanli-lambda changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment