Successfully run on GPU with DirectML
#4
by
Pengan
- opened
First install the DirectML ONNX Runtimepip install onnxruntime-directml
Then in the model.py
file
Changeproviders = ["CPUExecutionProvider"]
to providers = ["DmlExecutionProvider"]
It will work with GPU, verified with RTX4060 8GB and 7.7GB VRAM consumed.
Also tested with Intel i5 7200U's HD620 iGPU, running slowly at 5~6s per token but generate correct content with no problem.