Update README.md
Browse files
README.md
CHANGED
@@ -21,8 +21,51 @@ Here are some of the optimized configurations we have added:
|
|
21 |
1. ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN.
|
22 |
2. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
|
23 |
|
|
|
24 |
You can see how to run examples with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
## Model Description
|
27 |
- Developed by: Microsoft
|
28 |
- Model type: ONNX
|
|
|
21 |
1. ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN.
|
22 |
2. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
|
23 |
|
24 |
+
## Model Run
|
25 |
You can see how to run examples with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)
|
26 |
|
27 |
+
For CPU:
|
28 |
+
|
29 |
+
```bash
|
30 |
+
# Download the model directly using the Hugging Face CLI
|
31 |
+
huggingface-cli download microsoft/Phi-4-onnx/ --include Phi-4-onnx/cpu_and_mobile/* --local-dir .
|
32 |
+
|
33 |
+
# Install the CPU package of ONNX Runtime GenAI
|
34 |
+
pip install --pre onnxruntime-genai
|
35 |
+
|
36 |
+
# Please adjust the model directory (-m) accordingly
|
37 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
38 |
+
python phi3-qa.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
|
39 |
+
```
|
40 |
+
|
41 |
+
For CUDA:
|
42 |
+
|
43 |
+
```bash
|
44 |
+
# Download the model directly using the Hugging Face CLI
|
45 |
+
huggingface-cli download onnxruntime/Phi-4-onnx --include Phi-4-onnx/gpu/* --local-dir .
|
46 |
+
|
47 |
+
# Install the CUDA package of ONNX Runtime GenAI
|
48 |
+
pip install --pre onnxruntime-genai-cuda
|
49 |
+
|
50 |
+
# Please adjust the model directory (-m) accordingly
|
51 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
52 |
+
python phi3-qa.py -m gpu/gpu-int4-rtn-block-32 -e cuda
|
53 |
+
```
|
54 |
+
|
55 |
+
For DirectML:
|
56 |
+
|
57 |
+
```bash
|
58 |
+
# Download the model directly using the Hugging Face CLI
|
59 |
+
huggingface-cli download onnxruntime/Phi-4-onnx --include Phi-4-onnx/gpu/* --local-dir .
|
60 |
+
|
61 |
+
# Install the CUDA package of ONNX Runtime GenAI
|
62 |
+
pip install --pre onnxruntime-genai-cuda
|
63 |
+
|
64 |
+
# Please adjust the model directory (-m) accordingly
|
65 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
66 |
+
python phi3-qa.py -m gpu/gpu-int4-rtn-block-32 -e dml
|
67 |
+
```
|
68 |
+
|
69 |
## Model Description
|
70 |
- Developed by: Microsoft
|
71 |
- Model type: ONNX
|