K024 commited on
Commit
2397160
1 Parent(s): a108e1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -13,12 +13,14 @@ tags:
13
 
14
  This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference. Export code in [this repo](https://github.com/K024/chatglm-q).
15
 
16
- Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU.
17
 
18
- 安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。
19
 
20
  ## Usage
21
 
 
 
22
  ```sh
23
  git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
24
  cd ChatGLM-6b-onnx-u8s8
@@ -26,6 +28,13 @@ pip install -r requirements.txt
26
  streamlit run web-ui.py
27
  ```
28
 
 
 
 
 
 
 
 
29
  Codes are released under MIT license.
30
 
31
  Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).
 
13
 
14
  This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference. Export code in [this repo](https://github.com/K024/chatglm-q).
15
 
16
+ Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU. Arm64 with Neon support (Apple M1/M2) should be reasonably fast.
17
 
18
+ 安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理,在 Arm64 (Apple M1/M2) 上有可观的速度。具体的 ONNX 导出代码在[这个仓库](https://github.com/K024/chatglm-q)中。
19
 
20
  ## Usage
21
 
22
+ Clone with [git-lfs](https://git-lfs.com/):
23
+
24
  ```sh
25
  git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
26
  cd ChatGLM-6b-onnx-u8s8
 
28
  streamlit run web-ui.py
29
  ```
30
 
31
+ Or use `huggingface_hub` [python client lib](https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder) to download the repo snapshot:
32
+
33
+ ```python
34
+ from huggingface_hub import snapshot_download
35
+ snapshot_download(repo_id="K024/ChatGLM-6b-onnx-u8s8", local_dir="./ChatGLM-6b-onnx-u8s8")
36
+ ```
37
+
38
  Codes are released under MIT license.
39
 
40
  Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).