Successfully run on GPU with DirectML

#4
by Pengan - opened

First install the DirectML ONNX Runtime
pip install onnxruntime-directml
Then in the model.pyfile
Changeproviders = ["CPUExecutionProvider"] to providers = ["DmlExecutionProvider"]
It will work with GPU, verified with RTX4060 8GB and 7.7GB VRAM consumed.
Also tested with Intel i5 7200U's HD620 iGPU, running slowly at 5~6s per token but generate correct content with no problem.

Sign up or log in to comment