tjellm commited on
Commit
c71a687
1 Parent(s): 313a4e3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - ONNX
6
+ - DML
7
+ - DirectML
8
+ - ONNXRuntime
9
+ - gemma
10
+ - google
11
+ - conversational
12
+ - custom_code
13
+ inference: false
14
+ language:
15
+ - en
16
+ ---
17
+ # Gemma-7B-Instruct-ONNX
18
+
19
+ ## Model Summary
20
+ This repository contains optimized versions of the [gemma-7b-it](https://huggingface.co/google/gemma-7b-it) model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
21
+
22
+ ## ONNX Models
23
+
24
+ Here are some of the optimized configurations we have added:
25
+ - **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
26
+ - **ONNX model for int4 CPU and Mobile:** ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
27
+
28
+ ## Usage
29
+
30
+ ### Installation and Setup
31
+
32
+ To use the Gemma-7B-Instruct-ONNX model on Windows with DirectML, follow these steps:
33
+
34
+ 1. **Create and activate a Conda environment:**
35
+ ```sh
36
+ conda create -n onnx python=3.10
37
+ conda activate onnx
38
+ ```
39
+
40
+ 2. **Install Git LFS:**
41
+ ```sh
42
+ winget install -e --id GitHub.GitLFS
43
+ ```
44
+
45
+ 3. **Install Hugging Face CLI:**
46
+ ```sh
47
+ pip install huggingface-hub[cli]
48
+ ```
49
+
50
+ 4. **Download the model:**
51
+ ```sh
52
+ huggingface-cli download EmbeddedLLM/gemma-7b-it-onnx --include="onnx/directml/*" --local-dir .\gemma-7b-it-onnx
53
+ ```
54
+
55
+ 5. **Install necessary Python packages:**
56
+ ```sh
57
+ pip install numpy==1.26.4
58
+ pip install onnxruntime-directml
59
+ pip install --pre onnxruntime-genai-directml
60
+ ```
61
+
62
+ 6. **Install Visual Studio 2015 runtime:**
63
+ ```sh
64
+ conda install conda-forge::vs2015_runtime
65
+ ```
66
+
67
+ 7. **Download the example script:**
68
+ ```sh
69
+ Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
70
+ ```
71
+
72
+ 8. **Run the example script:**
73
+ ```sh
74
+ python phi3-qa.py -m .\gemma-7b-it-onnx
75
+ ```
76
+
77
+ ### Hardware Requirements
78
+
79
+ **Minimum Configuration:**
80
+ - **Windows:** DirectX 12-capable GPU (AMD/Nvidia)
81
+ - **CPU:** x86_64 / ARM64
82
+
83
+ **Tested Configurations:**
84
+ - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
85
+ - **CPU:** AMD Ryzen CPU
86
+
87
+ **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
88
+
89
+ This model card corresponds to the 7B instruct version of the Gemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/gemma-2b), [7B base model](https://huggingface.co/google/gemma-7b), and [2B instruct model](https://huggingface.co/google/gemma-2b-it).
90
+
91
+ **Resources and Technical Documentation**:
92
+
93
+ * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
94
+ * [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
95
+ * [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-7b-it-gg-hf)
96
+
97
+ **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)