Onnxruntime DirectML GenAI
Collection
Model Powered by Onnxruntime DirectML GenAI
•
12 items
•
Updated
This model is an ONNX-optimized version of microsoft/Phi-3-mini-4k-instruct (June 2024), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML). DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
Here are some of the optimized configurations we have added:
Minimum Configuration:
We measured the performance of DirectML on AMD Ryzen 9 7940HS /w Radeon 78
Prompt Length | Generation Length | Average Throughput (tps) |
---|---|---|
128 | 128 | - |
128 | 256 | - |
128 | 512 | - |
128 | 1024 | - |
256 | 128 | - |
256 | 256 | - |
256 | 512 | - |
256 | 1024 | - |
512 | 128 | - |
512 | 256 | - |
512 | 512 | - |
512 | 1024 | - |
1024 | 128 | - |
1024 | 256 | - |
1024 | 512 | - |
1024 | 1024 | - |