MODEL_LICENSE CHANGED
@@ -9,17 +9,14 @@ The ChatGLM3-6B License
9
  2. 许可授予
10
 
11
  根据本许可的条款和条件,许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
12
- 本许可允许您免费使用本仓库中的所有开源模型进行学术研究,对于希望将模型用于商业目的的用户,需在[这里](https://open.bigmodel.cn/mla/form)完成登记。
13
- 经过登记的用户可以免费使用本模型进行商业活动,但必须遵守本许可的所有条款和条件。
14
  上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
15
 
16
  3.限制
17
 
18
  您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
19
 
20
- 您不得利用本软件从事任何危害国家安全和国家统一,危害社会公共利益及公序良俗,侵犯他人商业秘密、知识产权、名誉权、肖像权、财产权等权益的行为。
21
-
22
- 您在使用中应遵循使用地所适用的法律法规政策、道德规范等要求。
23
 
24
  4.免责声明
25
 
@@ -43,15 +40,15 @@ The ChatGLM3-6B License
43
 
44
  2. License Grant
45
 
46
- Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
47
- This license permits you to use all open-source models in this repository for academic research free. Users who wish to use the models for commercial purposes must register [here](https://open.bigmodel.cn/mla/form).
48
- Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license.
49
- The license notice shall be included in all copies or substantial portions of the Software.
 
 
 
50
 
51
- 3. Restrictions
52
- You are not allowed to use, copy, modify, merge, publish, distribute, copy or create all or part of the derivative works of this software for any military or illegal purposes.
53
- You are not allowed to use this software to engage in any behavior that endangers national security and unity, endangers social public interests and public order, infringes on the rights and interests of others such as trade secrets, intellectual property rights, reputation rights, portrait rights, and property rights.
54
- You should comply with the applicable laws, regulations, policies, ethical standards, and other requirements in the place of use during use.
55
 
56
  4. Disclaimer
57
 
 
9
  2. 许可授予
10
 
11
  根据本许可的条款和条件,许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
12
+
 
13
  上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
14
 
15
  3.限制
16
 
17
  您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
18
 
19
+ 您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
 
 
20
 
21
  4.免责声明
22
 
 
40
 
41
  2. License Grant
42
 
43
+ Subject to the terms and conditions of this License, the Licensor hereby grants to you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license to use the Software.
44
+
45
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
46
+
47
+ 3. Restriction
48
+
49
+ You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any military, or illegal purposes.
50
 
51
+ You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
 
 
 
52
 
53
  4. Disclaimer
54
 
README.md CHANGED
@@ -19,11 +19,6 @@ tags:
19
  📍Experience the larger-scale ChatGLM model at <a href="https://www.chatglm.cn">chatglm.cn</a>
20
  </p>
21
 
22
- ## GLM-4 开源模型
23
-
24
- 我们已经发布最新的 **GLM-4** 模型,该模型在多个指标上有了新的突破,您可以在以下两个渠道体验我们的最新模型。
25
- + [GLM-4 开源模型](https://huggingface.co/THUDM/glm-4-9b-chat) 我们已经开源了 GLM-4-9B 系列模型,在各项指标的测试上有明显提升,欢迎尝试。
26
-
27
  ## 介绍 (Introduction)
28
  ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性:
29
 
@@ -86,15 +81,22 @@ The code in this repository is open-sourced under the [Apache-2.0 license](LICEN
86
 
87
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
88
 
89
- If you find our work helpful, please consider citing the following paper.
90
 
91
  ```
92
- @misc{glm2024chatglm,
93
- title={ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools},
94
- author={Team GLM and Aohan Zeng and Bin Xu and Bowen Wang and Chenhui Zhang and Da Yin and Diego Rojas and Guanyu Feng and Hanlin Zhao and Hanyu Lai and Hao Yu and Hongning Wang and Jiadai Sun and Jiajie Zhang and Jiale Cheng and Jiayi Gui and Jie Tang and Jing Zhang and Juanzi Li and Lei Zhao and Lindong Wu and Lucen Zhong and Mingdao Liu and Minlie Huang and Peng Zhang and Qinkai Zheng and Rui Lu and Shuaiqi Duan and Shudan Zhang and Shulin Cao and Shuxun Yang and Weng Lam Tam and Wenyi Zhao and Xiao Liu and Xiao Xia and Xiaohan Zhang and Xiaotao Gu and Xin Lv and Xinghan Liu and Xinyi Liu and Xinyue Yang and Xixuan Song and Xunkai Zhang and Yifan An and Yifan Xu and Yilin Niu and Yuantao Yang and Yueyan Li and Yushi Bai and Yuxiao Dong and Zehan Qi and Zhaoyu Wang and Zhen Yang and Zhengxiao Du and Zhenyu Hou and Zihan Wang},
95
- year={2024},
96
- eprint={2406.12793},
97
- archivePrefix={arXiv},
98
- primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
99
  }
100
  ```
 
 
 
 
 
 
 
 
 
 
19
  📍Experience the larger-scale ChatGLM model at <a href="https://www.chatglm.cn">chatglm.cn</a>
20
  </p>
21
 
 
 
 
 
 
22
  ## 介绍 (Introduction)
23
  ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上,ChatGLM3-6B 引入了如下特性:
24
 
 
81
 
82
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
83
 
84
+ If you find our work helpful, please consider citing the following papers.
85
 
86
  ```
87
+ @article{zeng2022glm,
88
+ title={Glm-130b: An open bilingual pre-trained model},
89
+ author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
90
+ journal={arXiv preprint arXiv:2210.02414},
91
+ year={2022}
 
 
92
  }
93
  ```
94
+ ```
95
+ @inproceedings{du2022glm,
96
+ title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
97
+ author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
98
+ booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
99
+ pages={320--335},
100
+ year={2022}
101
+ }
102
+ ```
model-00001-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b1052386eac358a18add3d0f92521c85ab338979da8eeb08a6499555b857f80d
3
- size 1827774160
 
 
 
 
model-00002-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f88d7f69a3940711b87308eb20dd0544df22aff81134fa0d5160d3c859f5d321
3
- size 1968291888
 
 
 
 
model-00003-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:447d41b7c5e7b2558905c98733469aa9e132540c91e13c4cdd7bfc58b60cc650
3
- size 1927406936
 
 
 
 
model-00004-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:386305288e2bcc4a26fcffef433c94592d155bf36e365f054f63d3e6b55a1429
3
- size 1815217640
 
 
 
 
model-00005-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6c304e2821774d649997d1f5ae5fa12afd4d1dc5fea2bf6e8b829405b81fa7f2
3
- size 1968291920
 
 
 
 
model-00006-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ef45bbd33d6d540d3d3523bb169b52b91b9e265176ab7efc6838233bc3b7e95d
3
- size 1927406960
 
 
 
 
model-00007-of-00007.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:6eb545c6aa8a839c56a78b80331fcb4fa403341925f79fb6a761e2e847bf1e40
3
- size 1052805400
 
 
 
 
model.safetensors.index.json DELETED
@@ -1,207 +0,0 @@
1
- {
2
- "metadata": {
3
- "total_size": 12487168064
4
- },
5
- "weight_map": {
6
- "transformer.embedding.word_embeddings.weight": "model-00001-of-00007.safetensors",
7
- "transformer.encoder.final_layernorm.weight": "model-00007-of-00007.safetensors",
8
- "transformer.encoder.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
- "transformer.encoder.layers.0.mlp.dense_4h_to_h.weight": "model-00001-of-00007.safetensors",
10
- "transformer.encoder.layers.0.mlp.dense_h_to_4h.weight": "model-00001-of-00007.safetensors",
11
- "transformer.encoder.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
12
- "transformer.encoder.layers.0.self_attention.dense.weight": "model-00001-of-00007.safetensors",
13
- "transformer.encoder.layers.0.self_attention.query_key_value.bias": "model-00001-of-00007.safetensors",
14
- "transformer.encoder.layers.0.self_attention.query_key_value.weight": "model-00001-of-00007.safetensors",
15
- "transformer.encoder.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
16
- "transformer.encoder.layers.1.mlp.dense_4h_to_h.weight": "model-00001-of-00007.safetensors",
17
- "transformer.encoder.layers.1.mlp.dense_h_to_4h.weight": "model-00001-of-00007.safetensors",
18
- "transformer.encoder.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
19
- "transformer.encoder.layers.1.self_attention.dense.weight": "model-00001-of-00007.safetensors",
20
- "transformer.encoder.layers.1.self_attention.query_key_value.bias": "model-00001-of-00007.safetensors",
21
- "transformer.encoder.layers.1.self_attention.query_key_value.weight": "model-00001-of-00007.safetensors",
22
- "transformer.encoder.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
23
- "transformer.encoder.layers.10.mlp.dense_4h_to_h.weight": "model-00003-of-00007.safetensors",
24
- "transformer.encoder.layers.10.mlp.dense_h_to_4h.weight": "model-00003-of-00007.safetensors",
25
- "transformer.encoder.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
26
- "transformer.encoder.layers.10.self_attention.dense.weight": "model-00003-of-00007.safetensors",
27
- "transformer.encoder.layers.10.self_attention.query_key_value.bias": "model-00003-of-00007.safetensors",
28
- "transformer.encoder.layers.10.self_attention.query_key_value.weight": "model-00003-of-00007.safetensors",
29
- "transformer.encoder.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
30
- "transformer.encoder.layers.11.mlp.dense_4h_to_h.weight": "model-00003-of-00007.safetensors",
31
- "transformer.encoder.layers.11.mlp.dense_h_to_4h.weight": "model-00003-of-00007.safetensors",
32
- "transformer.encoder.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
33
- "transformer.encoder.layers.11.self_attention.dense.weight": "model-00003-of-00007.safetensors",
34
- "transformer.encoder.layers.11.self_attention.query_key_value.bias": "model-00003-of-00007.safetensors",
35
- "transformer.encoder.layers.11.self_attention.query_key_value.weight": "model-00003-of-00007.safetensors",
36
- "transformer.encoder.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
37
- "transformer.encoder.layers.12.mlp.dense_4h_to_h.weight": "model-00004-of-00007.safetensors",
38
- "transformer.encoder.layers.12.mlp.dense_h_to_4h.weight": "model-00003-of-00007.safetensors",
39
- "transformer.encoder.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
40
- "transformer.encoder.layers.12.self_attention.dense.weight": "model-00003-of-00007.safetensors",
41
- "transformer.encoder.layers.12.self_attention.query_key_value.bias": "model-00003-of-00007.safetensors",
42
- "transformer.encoder.layers.12.self_attention.query_key_value.weight": "model-00003-of-00007.safetensors",
43
- "transformer.encoder.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
44
- "transformer.encoder.layers.13.mlp.dense_4h_to_h.weight": "model-00004-of-00007.safetensors",
45
- "transformer.encoder.layers.13.mlp.dense_h_to_4h.weight": "model-00004-of-00007.safetensors",
46
- "transformer.encoder.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
47
- "transformer.encoder.layers.13.self_attention.dense.weight": "model-00004-of-00007.safetensors",
48
- "transformer.encoder.layers.13.self_attention.query_key_value.bias": "model-00004-of-00007.safetensors",
49
- "transformer.encoder.layers.13.self_attention.query_key_value.weight": "model-00004-of-00007.safetensors",
50
- "transformer.encoder.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
51
- "transformer.encoder.layers.14.mlp.dense_4h_to_h.weight": "model-00004-of-00007.safetensors",
52
- "transformer.encoder.layers.14.mlp.dense_h_to_4h.weight": "model-00004-of-00007.safetensors",
53
- "transformer.encoder.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
54
- "transformer.encoder.layers.14.self_attention.dense.weight": "model-00004-of-00007.safetensors",
55
- "transformer.encoder.layers.14.self_attention.query_key_value.bias": "model-00004-of-00007.safetensors",
56
- "transformer.encoder.layers.14.self_attention.query_key_value.weight": "model-00004-of-00007.safetensors",
57
- "transformer.encoder.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
58
- "transformer.encoder.layers.15.mlp.dense_4h_to_h.weight": "model-00004-of-00007.safetensors",
59
- "transformer.encoder.layers.15.mlp.dense_h_to_4h.weight": "model-00004-of-00007.safetensors",
60
- "transformer.encoder.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
61
- "transformer.encoder.layers.15.self_attention.dense.weight": "model-00004-of-00007.safetensors",
62
- "transformer.encoder.layers.15.self_attention.query_key_value.bias": "model-00004-of-00007.safetensors",
63
- "transformer.encoder.layers.15.self_attention.query_key_value.weight": "model-00004-of-00007.safetensors",
64
- "transformer.encoder.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
65
- "transformer.encoder.layers.16.mlp.dense_4h_to_h.weight": "model-00004-of-00007.safetensors",
66
- "transformer.encoder.layers.16.mlp.dense_h_to_4h.weight": "model-00004-of-00007.safetensors",
67
- "transformer.encoder.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
68
- "transformer.encoder.layers.16.self_attention.dense.weight": "model-00004-of-00007.safetensors",
69
- "transformer.encoder.layers.16.self_attention.query_key_value.bias": "model-00004-of-00007.safetensors",
70
- "transformer.encoder.layers.16.self_attention.query_key_value.weight": "model-00004-of-00007.safetensors",
71
- "transformer.encoder.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
72
- "transformer.encoder.layers.17.mlp.dense_4h_to_h.weight": "model-00005-of-00007.safetensors",
73
- "transformer.encoder.layers.17.mlp.dense_h_to_4h.weight": "model-00005-of-00007.safetensors",
74
- "transformer.encoder.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
75
- "transformer.encoder.layers.17.self_attention.dense.weight": "model-00004-of-00007.safetensors",
76
- "transformer.encoder.layers.17.self_attention.query_key_value.bias": "model-00004-of-00007.safetensors",
77
- "transformer.encoder.layers.17.self_attention.query_key_value.weight": "model-00004-of-00007.safetensors",
78
- "transformer.encoder.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
79
- "transformer.encoder.layers.18.mlp.dense_4h_to_h.weight": "model-00005-of-00007.safetensors",
80
- "transformer.encoder.layers.18.mlp.dense_h_to_4h.weight": "model-00005-of-00007.safetensors",
81
- "transformer.encoder.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
82
- "transformer.encoder.layers.18.self_attention.dense.weight": "model-00005-of-00007.safetensors",
83
- "transformer.encoder.layers.18.self_attention.query_key_value.bias": "model-00005-of-00007.safetensors",
84
- "transformer.encoder.layers.18.self_attention.query_key_value.weight": "model-00005-of-00007.safetensors",
85
- "transformer.encoder.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
86
- "transformer.encoder.layers.19.mlp.dense_4h_to_h.weight": "model-00005-of-00007.safetensors",
87
- "transformer.encoder.layers.19.mlp.dense_h_to_4h.weight": "model-00005-of-00007.safetensors",
88
- "transformer.encoder.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
89
- "transformer.encoder.layers.19.self_attention.dense.weight": "model-00005-of-00007.safetensors",
90
- "transformer.encoder.layers.19.self_attention.query_key_value.bias": "model-00005-of-00007.safetensors",
91
- "transformer.encoder.layers.19.self_attention.query_key_value.weight": "model-00005-of-00007.safetensors",
92
- "transformer.encoder.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
93
- "transformer.encoder.layers.2.mlp.dense_4h_to_h.weight": "model-00001-of-00007.safetensors",
94
- "transformer.encoder.layers.2.mlp.dense_h_to_4h.weight": "model-00001-of-00007.safetensors",
95
- "transformer.encoder.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
96
- "transformer.encoder.layers.2.self_attention.dense.weight": "model-00001-of-00007.safetensors",
97
- "transformer.encoder.layers.2.self_attention.query_key_value.bias": "model-00001-of-00007.safetensors",
98
- "transformer.encoder.layers.2.self_attention.query_key_value.weight": "model-00001-of-00007.safetensors",
99
- "transformer.encoder.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
100
- "transformer.encoder.layers.20.mlp.dense_4h_to_h.weight": "model-00005-of-00007.safetensors",
101
- "transformer.encoder.layers.20.mlp.dense_h_to_4h.weight": "model-00005-of-00007.safetensors",
102
- "transformer.encoder.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
103
- "transformer.encoder.layers.20.self_attention.dense.weight": "model-00005-of-00007.safetensors",
104
- "transformer.encoder.layers.20.self_attention.query_key_value.bias": "model-00005-of-00007.safetensors",
105
- "transformer.encoder.layers.20.self_attention.query_key_value.weight": "model-00005-of-00007.safetensors",
106
- "transformer.encoder.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
107
- "transformer.encoder.layers.21.mlp.dense_4h_to_h.weight": "model-00005-of-00007.safetensors",
108
- "transformer.encoder.layers.21.mlp.dense_h_to_4h.weight": "model-00005-of-00007.safetensors",
109
- "transformer.encoder.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
110
- "transformer.encoder.layers.21.self_attention.dense.weight": "model-00005-of-00007.safetensors",
111
- "transformer.encoder.layers.21.self_attention.query_key_value.bias": "model-00005-of-00007.safetensors",
112
- "transformer.encoder.layers.21.self_attention.query_key_value.weight": "model-00005-of-00007.safetensors",
113
- "transformer.encoder.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
114
- "transformer.encoder.layers.22.mlp.dense_4h_to_h.weight": "model-00006-of-00007.safetensors",
115
- "transformer.encoder.layers.22.mlp.dense_h_to_4h.weight": "model-00006-of-00007.safetensors",
116
- "transformer.encoder.layers.22.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
117
- "transformer.encoder.layers.22.self_attention.dense.weight": "model-00006-of-00007.safetensors",
118
- "transformer.encoder.layers.22.self_attention.query_key_value.bias": "model-00006-of-00007.safetensors",
119
- "transformer.encoder.layers.22.self_attention.query_key_value.weight": "model-00006-of-00007.safetensors",
120
- "transformer.encoder.layers.23.input_layernorm.weight": "model-00006-of-00007.safetensors",
121
- "transformer.encoder.layers.23.mlp.dense_4h_to_h.weight": "model-00006-of-00007.safetensors",
122
- "transformer.encoder.layers.23.mlp.dense_h_to_4h.weight": "model-00006-of-00007.safetensors",
123
- "transformer.encoder.layers.23.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
124
- "transformer.encoder.layers.23.self_attention.dense.weight": "model-00006-of-00007.safetensors",
125
- "transformer.encoder.layers.23.self_attention.query_key_value.bias": "model-00006-of-00007.safetensors",
126
- "transformer.encoder.layers.23.self_attention.query_key_value.weight": "model-00006-of-00007.safetensors",
127
- "transformer.encoder.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
128
- "transformer.encoder.layers.24.mlp.dense_4h_to_h.weight": "model-00006-of-00007.safetensors",
129
- "transformer.encoder.layers.24.mlp.dense_h_to_4h.weight": "model-00006-of-00007.safetensors",
130
- "transformer.encoder.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
131
- "transformer.encoder.layers.24.self_attention.dense.weight": "model-00006-of-00007.safetensors",
132
- "transformer.encoder.layers.24.self_attention.query_key_value.bias": "model-00006-of-00007.safetensors",
133
- "transformer.encoder.layers.24.self_attention.query_key_value.weight": "model-00006-of-00007.safetensors",
134
- "transformer.encoder.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
135
- "transformer.encoder.layers.25.mlp.dense_4h_to_h.weight": "model-00006-of-00007.safetensors",
136
- "transformer.encoder.layers.25.mlp.dense_h_to_4h.weight": "model-00006-of-00007.safetensors",
137
- "transformer.encoder.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
138
- "transformer.encoder.layers.25.self_attention.dense.weight": "model-00006-of-00007.safetensors",
139
- "transformer.encoder.layers.25.self_attention.query_key_value.bias": "model-00006-of-00007.safetensors",
140
- "transformer.encoder.layers.25.self_attention.query_key_value.weight": "model-00006-of-00007.safetensors",
141
- "transformer.encoder.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
142
- "transformer.encoder.layers.26.mlp.dense_4h_to_h.weight": "model-00007-of-00007.safetensors",
143
- "transformer.encoder.layers.26.mlp.dense_h_to_4h.weight": "model-00006-of-00007.safetensors",
144
- "transformer.encoder.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
145
- "transformer.encoder.layers.26.self_attention.dense.weight": "model-00006-of-00007.safetensors",
146
- "transformer.encoder.layers.26.self_attention.query_key_value.bias": "model-00006-of-00007.safetensors",
147
- "transformer.encoder.layers.26.self_attention.query_key_value.weight": "model-00006-of-00007.safetensors",
148
- "transformer.encoder.layers.27.input_layernorm.weight": "model-00007-of-00007.safetensors",
149
- "transformer.encoder.layers.27.mlp.dense_4h_to_h.weight": "model-00007-of-00007.safetensors",
150
- "transformer.encoder.layers.27.mlp.dense_h_to_4h.weight": "model-00007-of-00007.safetensors",
151
- "transformer.encoder.layers.27.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
152
- "transformer.encoder.layers.27.self_attention.dense.weight": "model-00007-of-00007.safetensors",
153
- "transformer.encoder.layers.27.self_attention.query_key_value.bias": "model-00007-of-00007.safetensors",
154
- "transformer.encoder.layers.27.self_attention.query_key_value.weight": "model-00007-of-00007.safetensors",
155
- "transformer.encoder.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
156
- "transformer.encoder.layers.3.mlp.dense_4h_to_h.weight": "model-00002-of-00007.safetensors",
157
- "transformer.encoder.layers.3.mlp.dense_h_to_4h.weight": "model-00002-of-00007.safetensors",
158
- "transformer.encoder.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
159
- "transformer.encoder.layers.3.self_attention.dense.weight": "model-00001-of-00007.safetensors",
160
- "transformer.encoder.layers.3.self_attention.query_key_value.bias": "model-00001-of-00007.safetensors",
161
- "transformer.encoder.layers.3.self_attention.query_key_value.weight": "model-00001-of-00007.safetensors",
162
- "transformer.encoder.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
163
- "transformer.encoder.layers.4.mlp.dense_4h_to_h.weight": "model-00002-of-00007.safetensors",
164
- "transformer.encoder.layers.4.mlp.dense_h_to_4h.weight": "model-00002-of-00007.safetensors",
165
- "transformer.encoder.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
166
- "transformer.encoder.layers.4.self_attention.dense.weight": "model-00002-of-00007.safetensors",
167
- "transformer.encoder.layers.4.self_attention.query_key_value.bias": "model-00002-of-00007.safetensors",
168
- "transformer.encoder.layers.4.self_attention.query_key_value.weight": "model-00002-of-00007.safetensors",
169
- "transformer.encoder.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
170
- "transformer.encoder.layers.5.mlp.dense_4h_to_h.weight": "model-00002-of-00007.safetensors",
171
- "transformer.encoder.layers.5.mlp.dense_h_to_4h.weight": "model-00002-of-00007.safetensors",
172
- "transformer.encoder.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
173
- "transformer.encoder.layers.5.self_attention.dense.weight": "model-00002-of-00007.safetensors",
174
- "transformer.encoder.layers.5.self_attention.query_key_value.bias": "model-00002-of-00007.safetensors",
175
- "transformer.encoder.layers.5.self_attention.query_key_value.weight": "model-00002-of-00007.safetensors",
176
- "transformer.encoder.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
177
- "transformer.encoder.layers.6.mlp.dense_4h_to_h.weight": "model-00002-of-00007.safetensors",
178
- "transformer.encoder.layers.6.mlp.dense_h_to_4h.weight": "model-00002-of-00007.safetensors",
179
- "transformer.encoder.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
180
- "transformer.encoder.layers.6.self_attention.dense.weight": "model-00002-of-00007.safetensors",
181
- "transformer.encoder.layers.6.self_attention.query_key_value.bias": "model-00002-of-00007.safetensors",
182
- "transformer.encoder.layers.6.self_attention.query_key_value.weight": "model-00002-of-00007.safetensors",
183
- "transformer.encoder.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
184
- "transformer.encoder.layers.7.mlp.dense_4h_to_h.weight": "model-00002-of-00007.safetensors",
185
- "transformer.encoder.layers.7.mlp.dense_h_to_4h.weight": "model-00002-of-00007.safetensors",
186
- "transformer.encoder.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
187
- "transformer.encoder.layers.7.self_attention.dense.weight": "model-00002-of-00007.safetensors",
188
- "transformer.encoder.layers.7.self_attention.query_key_value.bias": "model-00002-of-00007.safetensors",
189
- "transformer.encoder.layers.7.self_attention.query_key_value.weight": "model-00002-of-00007.safetensors",
190
- "transformer.encoder.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
191
- "transformer.encoder.layers.8.mlp.dense_4h_to_h.weight": "model-00003-of-00007.safetensors",
192
- "transformer.encoder.layers.8.mlp.dense_h_to_4h.weight": "model-00003-of-00007.safetensors",
193
- "transformer.encoder.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
194
- "transformer.encoder.layers.8.self_attention.dense.weight": "model-00003-of-00007.safetensors",
195
- "transformer.encoder.layers.8.self_attention.query_key_value.bias": "model-00003-of-00007.safetensors",
196
- "transformer.encoder.layers.8.self_attention.query_key_value.weight": "model-00003-of-00007.safetensors",
197
- "transformer.encoder.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
198
- "transformer.encoder.layers.9.mlp.dense_4h_to_h.weight": "model-00003-of-00007.safetensors",
199
- "transformer.encoder.layers.9.mlp.dense_h_to_4h.weight": "model-00003-of-00007.safetensors",
200
- "transformer.encoder.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
201
- "transformer.encoder.layers.9.self_attention.dense.weight": "model-00003-of-00007.safetensors",
202
- "transformer.encoder.layers.9.self_attention.query_key_value.bias": "model-00003-of-00007.safetensors",
203
- "transformer.encoder.layers.9.self_attention.query_key_value.weight": "model-00003-of-00007.safetensors",
204
- "transformer.output_layer.weight": "model-00007-of-00007.safetensors",
205
- "transformer.rotary_pos_emb.inv_freq": "model-00001-of-00007.safetensors"
206
- }
207
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
modeling_chatglm.py CHANGED
@@ -14,7 +14,6 @@ from torch.nn import CrossEntropyLoss, LayerNorm, MSELoss, BCEWithLogitsLoss
14
  from torch.nn.utils import skip_init
15
  from typing import Optional, Tuple, Union, List, Callable, Dict, Any
16
  from copy import deepcopy
17
- import transformers
18
 
19
  from transformers.modeling_outputs import (
20
  BaseModelOutputWithPast,
@@ -46,9 +45,6 @@ CHATGLM_6B_PRETRAINED_MODEL_ARCHIVE_LIST = [
46
  # See all ChatGLM models at https://huggingface.co/models?filter=chatglm
47
  ]
48
 
49
- is_transformers_4_42_or_higher = int(transformers.__version__.split(".")[1]) >= 42
50
- is_transformers_4_44_or_higher = int(transformers.__version__.split(".")[1]) >= 44
51
-
52
 
53
  def default_init(cls, *args, **kwargs):
54
  return cls(*args, **kwargs)
@@ -185,7 +181,7 @@ def apply_rotary_pos_emb(x: torch.Tensor, rope_cache: torch.Tensor) -> torch.Ten
185
  class RMSNorm(torch.nn.Module):
186
  def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None, **kwargs):
187
  super().__init__()
188
- self.weight = torch.nn.Parameter(torch.ones(normalized_shape, device=device, dtype=dtype))
189
  self.eps = eps
190
 
191
  def forward(self, hidden_states: torch.Tensor):
@@ -638,8 +634,7 @@ class GLMTransformer(torch.nn.Module):
638
  attention_mask,
639
  rotary_pos_emb,
640
  kv_caches[index],
641
- use_cache,
642
- use_reentrant=False
643
  )
644
  else:
645
  layer_ret = layer(
@@ -702,9 +697,9 @@ class ChatGLMPreTrainedModel(PreTrainedModel):
702
  position_ids = torch.arange(seq_length, dtype=torch.long, device=device).unsqueeze(0).repeat(batch_size, 1)
703
  return position_ids
704
 
705
- def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
706
- if not self.supports_gradient_checkpointing:
707
- raise ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")
708
 
709
 
710
  class Embedding(torch.nn.Module):
@@ -773,9 +768,6 @@ class ChatGLMModel(ChatGLMPreTrainedModel):
773
  def get_input_embeddings(self):
774
  return self.embedding.word_embeddings
775
 
776
- def set_input_embeddings(self, value):
777
- self.embedding.word_embeddings = value
778
-
779
  def get_prompt(self, batch_size, device, dtype=torch.half):
780
  prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1).to(device)
781
  past_key_values = self.prefix_encoder(prefix_tokens).type(dtype)
@@ -876,19 +868,9 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
876
  standardize_cache_format: bool = False,
877
  ) -> Dict[str, Any]:
878
  # update past_key_values
879
- if is_transformers_4_44_or_higher:
880
- model_kwargs["past_key_values"] = self._extract_past_from_model_output(
881
- outputs
882
- )[1]
883
- elif is_transformers_4_42_or_higher:
884
- # update past_key_values
885
- model_kwargs["past_key_values"] = self._extract_past_from_model_output(
886
- outputs, standardize_cache_format=standardize_cache_format
887
- )[1]
888
- else:
889
- model_kwargs["past_key_values"] = self._extract_past_from_model_output(
890
- outputs, standardize_cache_format=standardize_cache_format
891
- )
892
 
893
  # update attention mask
894
  if "attention_mask" in model_kwargs:
@@ -1018,10 +1000,7 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
1018
  content = ""
1019
  history = deepcopy(history)
1020
  for response in output.split("<|assistant|>"):
1021
- if "\n" in response:
1022
- metadata, content = response.split("\n", maxsplit=1)
1023
- else:
1024
- metadata, content = "", response
1025
  if not metadata.strip():
1026
  content = content.strip()
1027
  history.append({"role": "assistant", "metadata": metadata, "content": content})
 
14
  from torch.nn.utils import skip_init
15
  from typing import Optional, Tuple, Union, List, Callable, Dict, Any
16
  from copy import deepcopy
 
17
 
18
  from transformers.modeling_outputs import (
19
  BaseModelOutputWithPast,
 
45
  # See all ChatGLM models at https://huggingface.co/models?filter=chatglm
46
  ]
47
 
 
 
 
48
 
49
  def default_init(cls, *args, **kwargs):
50
  return cls(*args, **kwargs)
 
181
  class RMSNorm(torch.nn.Module):
182
  def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None, **kwargs):
183
  super().__init__()
184
+ self.weight = torch.nn.Parameter(torch.empty(normalized_shape, device=device, dtype=dtype))
185
  self.eps = eps
186
 
187
  def forward(self, hidden_states: torch.Tensor):
 
634
  attention_mask,
635
  rotary_pos_emb,
636
  kv_caches[index],
637
+ use_cache
 
638
  )
639
  else:
640
  layer_ret = layer(
 
697
  position_ids = torch.arange(seq_length, dtype=torch.long, device=device).unsqueeze(0).repeat(batch_size, 1)
698
  return position_ids
699
 
700
+ def _set_gradient_checkpointing(self, module, value=False):
701
+ if isinstance(module, GLMTransformer):
702
+ module.gradient_checkpointing = value
703
 
704
 
705
  class Embedding(torch.nn.Module):
 
768
  def get_input_embeddings(self):
769
  return self.embedding.word_embeddings
770
 
 
 
 
771
  def get_prompt(self, batch_size, device, dtype=torch.half):
772
  prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1).to(device)
773
  past_key_values = self.prefix_encoder(prefix_tokens).type(dtype)
 
868
  standardize_cache_format: bool = False,
869
  ) -> Dict[str, Any]:
870
  # update past_key_values
871
+ model_kwargs["past_key_values"] = self._extract_past_from_model_output(
872
+ outputs, standardize_cache_format=standardize_cache_format
873
+ )
 
 
 
 
 
 
 
 
 
 
874
 
875
  # update attention mask
876
  if "attention_mask" in model_kwargs:
 
1000
  content = ""
1001
  history = deepcopy(history)
1002
  for response in output.split("<|assistant|>"):
1003
+ metadata, content = response.split("\n", maxsplit=1)
 
 
 
1004
  if not metadata.strip():
1005
  content = content.strip()
1006
  history.append({"role": "assistant", "metadata": metadata, "content": content})
quantization.py CHANGED
@@ -1,17 +1,20 @@
 
1
  from torch.nn.parameter import Parameter
 
2
  import bz2
3
  import torch
4
  import base64
5
  import ctypes
6
  from transformers.utils import logging
 
7
  from typing import List
 
8
 
9
  logger = logging.get_logger(__name__)
10
 
11
  try:
12
  from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
13
 
14
-
15
  class Kernel:
16
  def __init__(self, code: bytes, function_names: List[str]):
17
  self.code = code
@@ -21,7 +24,6 @@ try:
21
  for name in self._function_names:
22
  setattr(self, name, KernelFunction(self._cmodule, name))
23
 
24
-
25
  quantization_code = "$QlpoOTFBWSZTWU9yuJUAQHN//////////f/n/8/n///n//bt4dTidcVx8X3V9FV/92/v4B7/AD5FBQFAAAChSgKpFCFAFVSigUAAAEKhSgUUqgFBKigqVREQAABQBQIANDTTIGI00BkZBkNGE0A0BkBkGQGRkaNAaAGQNBoGgDIAAYIGTI0DQAQAaGmmQMRpoDIyDIaMJoBoDIDIMgMjI0aA0AMgaDQNAGQAAwQMmRoGgAgA0NNMgYjTQGRkGQ0YTQDQGQGQZAZGRo0BoAZA0GgaAMgABggZMjQNABABoaaZAxGmgMjIMhowmgGgMgMgyAyMjRoDQAyBoNA0AZAADBAyZGgaAAmqU1NEgJqnptU/Sn4jRR6J6epk2pqb1Q/SgAPUGgyNNGjQ2SBpoAZAAGg0NB6mgDIAAAAA2oaApSREBNAARhGiYEaEwU8pvImlP0k2aam1GaGqbFNM1MHpTwmkepmyU9R6nqPKekHqNNPUxNGhp6n6p6QaZ6o9TG1GMqcoV9ly6nRanHlq6zPNbnGZNi6HSug+2nPiZ13XcnFYZW+45W11CumhzYhchOJ2GLLV1OBjBjGf4TptOddTSOcVxhqYZMYwZXZZY00zI1paX5X9J+b+f4e+x43RXSxXPOdquiGpduatGyXneN696M9t4HU2eR5XX/kPhP261NTx3JO1Ow7LyuDmeo9a7d351T1ZxnvnrvYnrXv/hXxPCeuYx2XsNmO003eg9J3Z6U7b23meJ4ri01OdzTk9BNO96brz+qT5nuvvH3ds/G+m/JcG/F2XYuhXlvO+jP7U3XgrzPN/lr8Sf1n6j4j7jZs+s/T0tNaNNYzTs12rxjwztHlnire3Nzc3N1wuBwOBwXBvZfoHpD7rFmR99V5vj3aXza3xdBbXMalubTg/jIv5dfAi54Pdc75j4z412n3Npj3Ld/ENm7a3b/Cod6h/ret1/5vn/C+l+gdslMvgPSLJ8d8q+U66fevYn/tW1chleEtNTGlcHCbLRlq0tHzF5tsbbZZfHjjLgZu42XCuC3NrdjTasZGNzgxPIrGqp7r3p7L2p5XjnpPSmTd5XtzqnB6U87zzg1Ol0zd0zsLszxR6lkxp35u6/teL0L0W922cR7Lu1lpL9CsHirzuM2T+BgsyViT6LHcm0/Vr6U/7LGGyJeqTEjt0PHWhF5mCT7R9mtlDwriYv0Tyr/OxYt6qp5r0mPVT0608TqnqMZaarU2nFwrTzzlrs1ed7z1ux60wyr4ydCaTi3enW8x68x0zU7tXSlcmPSW1mGpWJMg4zmPC2lK96tp0OE80y4MfEvnZj8zGluR6b22ki1Ou9V2nCd9xovcPvcYMZYy0lvN60ScZ45vN6yeCeeXFb1lVjnnCar5fwXwE2bzJ4HI1XVPXfXZMm44GUsMpYsmLB65TuVdm0cl0b+i/wGNN66XjeV7zuPpHcnK/juhhjdfId5jMdE5nN0dGmmm2zZs2cexD5n9p/dY352XsvXHaZNWWsmmS1atjR452nYudzvqv2HMRyvNNnlMcDl3R2+yx2uVrBubTW9icHDVtbNXlZm7jma1rM4VurZZd2y6nUau7ZXZ7bVU+mnoOVxZGMrVmvX60605JwmzGZhhhjTWtaaaMaaGTGmNMZasY0iX8VMUl8eepaIrzGSpemWOQyZORk2bNpjUybMmxqYmknCGCFynutfksaZpjTNMaaatM0xsxcGR0sociNqxNSmhhR1ZJPbsn8qyF0t2qH6iYBclclalbtTTcHTDsPaX6rlnElph2Jyumumtynv2Kk8GI7rsvXbIcJgHJOSaSXnnGaI3m87RtVXJOZ/YtgdTE6Wpha6ZlE8ayXkef1fh602r2WwvfMXtMdLlkfnLFdYYwYso+bWqm7yJqHXZGw2nrS5ZanSYnWlxBxMF1V940K2wdrI7R6OYf7DGGamMmTSbRhlS45xmVOumF1EyPCmHrrN8wwZOOrdNtLeMtzFzDlWnfTBxMk2NaXIZHBYxYLD4w8yju0ao65Vz1OIXoS9dLanwCe1PWrYuWMqf1if1z2k2yYfKJ741PDgno1ZQ8DRqvUny3mNoWTzGO6m1DkrJI8JiR5cSd+vZdGOO8nrMoc5+NDUFsMSXaZJeNlMmGLtJsovOsUp7I9S5VojKxF6bTVEelXqlfJobQr3LozSh2Jk7VcrVMfhXqszGWMzNqGhqZY0OadxkyyMssKugZR0KNFXBHlqwmJgTE/BNVMk6ItJXZMR0H47GpXv/DMOvNkmVuaV1PRfEdxuqc7Hcd+ZV/zTLaRxWk0nl9CdCeM6mn5rstHIBcpiuwmUZXeq81DacHI2rmrZ5SuE5mOZd6LQrZg9mx32TprA8BMo5jKN6yLTCi3WzQaZSuhzTtM1fUTGVpG8Tw+KXI0tjEpiWxtLYynOlktSbVlaI5kxP8TDH8kx50xoxi5KcA4pcja8KWLRlO/Ks6q06ergnvm1ca3Tq8Uw7LTUsmWyctXPWmpitl/uvGcWTGXGuAXDfhqazGmjkxcJW5hMMMMpYsXl2TZYtVOddG3XCarUt6Ptq9CZXSNzyuRzqRZOjsxdBbFVz6OA5HI43r1jityVlVpVkxmOsyaYWE1NTGq1sOVh36mHMcxtSvcy70edG0ZGR3I1Go1GRlV7mWWo1G0ZGRqlvH40l7o4m5xMWLLLYyNjnqc8556mdPqLJ31n/1nWOncxzG1tizrHs/Z+d2vP/B/l8wdJ6rHUn2nbbDq4p6htFtYzMMMTaZis1K5GKzGNmxhmUx2DDlZ/qNnIx41xnaMfCZWYaZWtNLTNW8ND4Fw1MyZOCdM428suKG1ehW8TesOydg7J+YYcD4cYR+8dFK6M4E3HM9ZfRNNL+Sn6rsl4DsrDl2HpPCnfxjGXtbZtYys1ttlyJ4T+BvexjGWRjMszK4Jpc77D3GyuVD7q0+G8m9G+2+rGm7cOR2y7FdtY2XUYx/oNlfRYxhMYyYZkyyg55enna9Kt/FFi6GMMwYwdwxWgxGMLKYmUyGExTKMZkMFhkymKuh0NOBNnBu+23LdwDoZYYzGGMxtORaTU1pjTGWTTGGtMrNWUsyyTTLLG1qy2ZjbK2DBllWqxMtBMaYZQmcE7zvvRcTkclUwdkxTaSdyySt/7fpL+T1v516Ji97fwr5JbLu305zMn5+GMTTZ9F+y7ExwmGVfG44yxn3dLv6l5i+Wth1jCrDq21nW9LqvvDzz3Vf3LLH/O/32TJ/erx3bXftO4eF+G956D952K/An4NfvOpjFjExjevP/UmE0fIoZXx6/w6lX/no3D0bLt+ixjieBM6ksRd0yB4Lt2SwYNE+gd1detlZWUnpiZfGfFaK+4PyCa/v18V8X75pe9fLXzp7l3VjF76vWZmHwGz1IZNWT7b8yddJ4q5kyrVdfru6atWc7bVYztL9Jf4GXvT+Y8m9/YsXP6H018a8D4XVOqvfzqeR+6yZOD8dPv0+U7/q5Pl+2dNb0MjzGVH5p6MNQ7cOWvw62U9aHE8DprDek+McLyvDz+te+9Zhq5+YTruufMcWMabqysTmZVWjKPfnK0wyVcrsuhjZRdLkHNvD72b9abriOSGIxiLixMOoalNPXzy+wT/tf+U6HHONfsz+xe8ufHBdQWWGWLA9if0rsnmrxK5LvRZQeWsTCsrmOYy8VteVfuRfcVTtDLItLIsMYxZLdU/DbtSemxF6Z6Zo5WBXE4tFdCyVMMXMTEMZXVlS6Xec2T4e0tHsRcEuWshcJ2YsNF5rUx1E8ifCq6Z+ZP7qdCeu/aTwFd53l16/o0NOw6O3dLavP4Hbi4RdmuDk6DoYaninC0+o4uZjbJ7Rxeu0/FbuFg+q7DVS6fQe0rZ6NDGUNNU6DEqOaLTicKnYZMnBWruljQxoaS3dZhocDge0bSTyOvdAbG5hxe2xji7E/L55xX13wWNDi6HCekcFxfCPGxY0MXC+s7afWaMdDyjyr+o8Rudm/NabOZvdl274zH4f5XK9z6On1Pe/K5TdPAslg77BjuO6Y3eO7GqvOPG/stknp1leyvLL0Z7bl9I4noMvLkzytLhWYzrOZzLXCORe028rORzOg4N/L0HlMOQ3Pgmnbb6KczlabORpu980q37TBqRu0/p3PO6234Bl03Ynuz+9W7gnsEcmvYaYY3aMYY0wx3pYd+ujsXauWdaY5Xkbtl23fPzFHiDB/QMo0yFjBllYxTQYYyxkrwn7JufwJ/PfgJ+C83X69ni6zvXcnyXabv0ncbLwsceS+RNlyN2mnneJtX0ngYO0+e+0+UnA+Wch3ji8hj5an4h+i6XBySU4n+R0roVcbw5yvHrmr4Yw8Y7x6c+9POPYHI5HI5HI5HI5HGXGww4nE4nrVyOR8XeqPEO7PLOiukYa3Novk5hV4cdtYZLI93e+uxff2jRo0aNGjRo0aNG1bVtW1dy3m83m8+tQ5ZzHw3nObwOu8La9Rc1dtkdS8A3eTk823tnktXWlxN6Oixe06zrN70Isd9jiOgZFq9yfkPqP/SLhN2Myl8jDM43bl1nbcb4cO57jlh8Jow6pzXZdL4dyODTuuhu77FyO27DdwdRxmvO+O+3N2+BdqyTwLHVczDVY4UPE4O66/ZO2cx1LFzVdSXtF7G4HMbrauOHRw6c8FdZ5m9fHZHYZXfTlZquyynSyTTKke6vcffSD9pzPA/G7n7jxPmuhc1DHMynPMrGL6AdewYmwu5ko+UUyTwrMv27rPH1v1nGqd87+p6N6LU8k3NEng53xXyHS97+44OSg/sy/hn+Se6yfYNjW0/uTgP+PvWYzLMmjhcLB/gGpri6H83/84eUXWT6T9Hsv7785z/7z4icpW+zfXypuR7rx/gMdZb1/wC678pcs8/2a3mDitGHxl9mfPlll5MafWWqxk/eYuTDgcNMzDGWLWvsuglNxs53GtN6uWpktlW1tZZYcuinMMWmnNnJydze3b2Y1McBxrBkXw799izLMZZYyy0TkbsGM4p03S2uVu5s/XXUdSdec6smVxZYYGpVmT8A+8ajuEyV5FatkvVru2x6uxGXXbH4A+jvgP4GMYy3iPLXzq/6z65+E005ey+cwMZD3fZcqc6xpjTFjQ0P3U+e++cPYmTIwj0nrK5NPTfl3WvpfLtXDcb2HQMudYOxFXQBor4L4T6vrOauFctYXJQ++NUWmJe5bmx1jDiZS1dTqWxo4GR8jm3fttpmPHppk9PEyv4/y8/sO07XacOmcqc0x2Vi9BvNJvN5oW8x4mOsydpidRxMYJPx06m1bqPzq9KtK8sxXNXFodD/+MYYaJTLwOhc9brCsV18oOR1i4tXChyTkq4lf4y1Ke+9axjDHqs1mfBbMXuP4Hzi+X7t8vzv7bHerrUPgPCxhjre4fXdfLNtNM+Jd+Zdh8xd8wP87uNPoPgv4W7/5P2BuxfsMabNnMnza+54Pdi5U671GPZY8CehX8Voeoo7FHpkeEc6715FwHZrIrUrHaviPUbPZHND+IhczrP6FcYvhOZ0Di/ETt0OI+YwNWR9r7tpf6WDeZKZDB1+z2IthOl1mPyb5FluvEx9h9d0NnM0Y1XPFkWIsk1WotJ0PBMmkvjvQTd0e71tfeV+8r8lQ/tpzpsmxJ+InrI/dj2UajUajVTUajatRqNRtGo1Go1Go4wjeMpZFMVV9CHbofPraLsJ3JpWV2XOoanCuFky4y3PPNxucK2uKC1Lbdb1eo+m5XomN6HfeZsabHLHRX/K+offtNGGmHWctcVcG44MdSqsOLY9VzX+Zxfxn2HPdWTpzWvkrtJ8M5zorrKcquRytJ5N5DZmcaW02l76nWO+BqPXm1A2Ry/0q71dH/mqrqeFjkYxjEXtsX8qubTk67rGycyqsdm4tZx5D6D5hhi0waaWmiaMP81Yjii5qxPlPuU/GfTL1Y5E6Jyfiq63qTa39A4J0sOGDgO9WF9bOXl0XfPRbsY2bPNKPy1YrFYrFYmRhhlTIyMjJWJYZHXuCXI8OoXsvfljGLFicNifpp2XunoPiG1wtx3p1Tah+/DD66OnVtVXP9rKbVxOnL0tR/rHtqB5UDErUVcl11D4qqvjpOcxX7armUNJB3LpW6bxVvD08e8h3odKKvyCFZBdSh2FVcST9xV3n3T8t1j7Kr9qgrqXg+13Pt5U7JCvFXVIV1YG5lRhkVYZJYYDDD4KOIMoHCp26WS8GB7uBh2zIdgq/PKyInjV2STShuoapUdCpX1yTwqq/z1VvET7Kh5nVPkO8YyxjLt2MaaMmWTLQvx3qnzltnXW0p2jxgbEtSny/Osv8Y9pLMXYoHVPAhkVdWVeODhR6q9/Sxe2liwwZWMVvFXfRkeIDxAePUPIrdJ4ey6yquzH+PD/bUOWAu05qVHtFd8rrKHSoeNIOUqrYr3FXyToqfYJgwmJdKpXXOwYYegNNGMzfZPp/t3t/DVs4zjNTN61rRqaWaa4NYbRjTa0tWwy2Y2tGN8ZO8ofNKq4j9SL7I+cSm4/6ovLV5HNXLI0jJidwrtk6ynCaP6Z++GjRlWS3tLeW129Mi9evxU9mtz6s5J3Z7M2ngTgnKvmpomxpaLCzPfmx0JWE+m3NLDDGOX47RctdYYNK5jakdqLkRlI39n590T5zctGSwwZZDJj6kW8XSi6ot2MmWWJ0DUT3nuvebBudScjZ79g8cWJ8av0k+/bE5WKd5MdbFpbDVMxu1DVMmtNZGJvq1mtRbn6M+g/kP0FwDwr7quZs7xosNGpbscyxhhd9TyJyFwbLcxlTasg75vW7TsV5K7ji44XPMMrdoj+Y3rT0Hie62nlYV/pwczzOmdLqLhYkzGMzCZWGMQzGMSsZYY6Di1t4nlJ+Em63mJxrVLxPbYxNEdgc1dU2iOKyoYYWjNrEeHTYybVk0atSa7ehuwsWMWTqn1TrnS6hYsi71d1+s+k+ic70e20fzE/VaTdxT9ZtU4GIXdeNx3X77guYYfpHeTQjaMX6brOu4OY4K7Y2d9mbHarI5ox3p4GpJ2Vd/Tst60f7j999pppjR+Q/Qf8J/VaORs3cji7FfFuN61+ui9s8hix1OCh5KGVV23BPXvZfz3CLyHpix+exi8z/KnCnosY2eunor+cxyPO/xJ0vKey9OvE9VjqaYu0x3Z3jd6o2b1T12D+F8l232lwaaacD5LE8LBxu7WTlbWraWpew8Xexjel3E+wWD4APITdNqR8F3R3T0lunCQ4GaE9R37DxeCYfcHi4xci5ovKfxVs55y2hf+65E/Xdp6jR5nrebTmi5incpkyOjs50JvrZwstbbW6kfuuQw+2mykf/EXNFzxfKTrxew929TR6bWnGL//F3JFOFCQT3K4lQ"
26
 
27
  kernels = Kernel(
@@ -120,12 +122,11 @@ def extract_weight_to_half(weight: torch.Tensor, scale_list: torch.Tensor, sourc
120
 
121
 
122
  class QuantizedLinear(torch.nn.Module):
123
- def __init__(self, weight_bit_width: int, weight, bias=None, device="cuda", dtype=None, empty_init=False):
 
124
  super().__init__()
125
- weight = weight.to(device) # ensure the weight is on the cuda device
126
- assert str(weight.device).startswith(
127
- 'cuda'), 'The weights that need to be quantified should be on the CUDA device'
128
  self.weight_bit_width = weight_bit_width
 
129
  shape = weight.shape
130
 
131
  if weight is None or empty_init:
@@ -153,7 +154,7 @@ def quantize(model, weight_bit_width, empty_init=False, device=None):
153
  for layer in model.layers:
154
  layer.self_attention.query_key_value = QuantizedLinear(
155
  weight_bit_width=weight_bit_width,
156
- weight=layer.self_attention.query_key_value.weight,
157
  bias=layer.self_attention.query_key_value.bias,
158
  dtype=layer.self_attention.query_key_value.weight.dtype,
159
  device=layer.self_attention.query_key_value.weight.device if device is None else device,
@@ -161,7 +162,7 @@ def quantize(model, weight_bit_width, empty_init=False, device=None):
161
  )
162
  layer.self_attention.dense = QuantizedLinear(
163
  weight_bit_width=weight_bit_width,
164
- weight=layer.self_attention.dense.weight,
165
  bias=layer.self_attention.dense.bias,
166
  dtype=layer.self_attention.dense.weight.dtype,
167
  device=layer.self_attention.dense.weight.device if device is None else device,
@@ -169,7 +170,7 @@ def quantize(model, weight_bit_width, empty_init=False, device=None):
169
  )
170
  layer.mlp.dense_h_to_4h = QuantizedLinear(
171
  weight_bit_width=weight_bit_width,
172
- weight=layer.mlp.dense_h_to_4h.weight,
173
  bias=layer.mlp.dense_h_to_4h.bias,
174
  dtype=layer.mlp.dense_h_to_4h.weight.dtype,
175
  device=layer.mlp.dense_h_to_4h.weight.device if device is None else device,
@@ -177,7 +178,7 @@ def quantize(model, weight_bit_width, empty_init=False, device=None):
177
  )
178
  layer.mlp.dense_4h_to_h = QuantizedLinear(
179
  weight_bit_width=weight_bit_width,
180
- weight=layer.mlp.dense_4h_to_h.weight,
181
  bias=layer.mlp.dense_4h_to_h.bias,
182
  dtype=layer.mlp.dense_4h_to_h.weight.dtype,
183
  device=layer.mlp.dense_4h_to_h.weight.device if device is None else device,
 
1
+ from torch.nn import Linear
2
  from torch.nn.parameter import Parameter
3
+
4
  import bz2
5
  import torch
6
  import base64
7
  import ctypes
8
  from transformers.utils import logging
9
+
10
  from typing import List
11
+ from functools import partial
12
 
13
  logger = logging.get_logger(__name__)
14
 
15
  try:
16
  from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
17
 
 
18
  class Kernel:
19
  def __init__(self, code: bytes, function_names: List[str]):
20
  self.code = code
 
24
  for name in self._function_names:
25
  setattr(self, name, KernelFunction(self._cmodule, name))
26
 
 
27
  quantization_code = "$QlpoOTFBWSZTWU9yuJUAQHN//////////f/n/8/n///n//bt4dTidcVx8X3V9FV/92/v4B7/AD5FBQFAAAChSgKpFCFAFVSigUAAAEKhSgUUqgFBKigqVREQAABQBQIANDTTIGI00BkZBkNGE0A0BkBkGQGRkaNAaAGQNBoGgDIAAYIGTI0DQAQAaGmmQMRpoDIyDIaMJoBoDIDIMgMjI0aA0AMgaDQNAGQAAwQMmRoGgAgA0NNMgYjTQGRkGQ0YTQDQGQGQZAZGRo0BoAZA0GgaAMgABggZMjQNABABoaaZAxGmgMjIMhowmgGgMgMgyAyMjRoDQAyBoNA0AZAADBAyZGgaAAmqU1NEgJqnptU/Sn4jRR6J6epk2pqb1Q/SgAPUGgyNNGjQ2SBpoAZAAGg0NB6mgDIAAAAA2oaApSREBNAARhGiYEaEwU8pvImlP0k2aam1GaGqbFNM1MHpTwmkepmyU9R6nqPKekHqNNPUxNGhp6n6p6QaZ6o9TG1GMqcoV9ly6nRanHlq6zPNbnGZNi6HSug+2nPiZ13XcnFYZW+45W11CumhzYhchOJ2GLLV1OBjBjGf4TptOddTSOcVxhqYZMYwZXZZY00zI1paX5X9J+b+f4e+x43RXSxXPOdquiGpduatGyXneN696M9t4HU2eR5XX/kPhP261NTx3JO1Ow7LyuDmeo9a7d351T1ZxnvnrvYnrXv/hXxPCeuYx2XsNmO003eg9J3Z6U7b23meJ4ri01OdzTk9BNO96brz+qT5nuvvH3ds/G+m/JcG/F2XYuhXlvO+jP7U3XgrzPN/lr8Sf1n6j4j7jZs+s/T0tNaNNYzTs12rxjwztHlnire3Nzc3N1wuBwOBwXBvZfoHpD7rFmR99V5vj3aXza3xdBbXMalubTg/jIv5dfAi54Pdc75j4z412n3Npj3Ld/ENm7a3b/Cod6h/ret1/5vn/C+l+gdslMvgPSLJ8d8q+U66fevYn/tW1chleEtNTGlcHCbLRlq0tHzF5tsbbZZfHjjLgZu42XCuC3NrdjTasZGNzgxPIrGqp7r3p7L2p5XjnpPSmTd5XtzqnB6U87zzg1Ol0zd0zsLszxR6lkxp35u6/teL0L0W922cR7Lu1lpL9CsHirzuM2T+BgsyViT6LHcm0/Vr6U/7LGGyJeqTEjt0PHWhF5mCT7R9mtlDwriYv0Tyr/OxYt6qp5r0mPVT0608TqnqMZaarU2nFwrTzzlrs1ed7z1ux60wyr4ydCaTi3enW8x68x0zU7tXSlcmPSW1mGpWJMg4zmPC2lK96tp0OE80y4MfEvnZj8zGluR6b22ki1Ou9V2nCd9xovcPvcYMZYy0lvN60ScZ45vN6yeCeeXFb1lVjnnCar5fwXwE2bzJ4HI1XVPXfXZMm44GUsMpYsmLB65TuVdm0cl0b+i/wGNN66XjeV7zuPpHcnK/juhhjdfId5jMdE5nN0dGmmm2zZs2cexD5n9p/dY352XsvXHaZNWWsmmS1atjR452nYudzvqv2HMRyvNNnlMcDl3R2+yx2uVrBubTW9icHDVtbNXlZm7jma1rM4VurZZd2y6nUau7ZXZ7bVU+mnoOVxZGMrVmvX60605JwmzGZhhhjTWtaaaMaaGTGmNMZasY0iX8VMUl8eepaIrzGSpemWOQyZORk2bNpjUybMmxqYmknCGCFynutfksaZpjTNMaaatM0xsxcGR0sociNqxNSmhhR1ZJPbsn8qyF0t2qH6iYBclclalbtTTcHTDsPaX6rlnElph2Jyumumtynv2Kk8GI7rsvXbIcJgHJOSaSXnnGaI3m87RtVXJOZ/YtgdTE6Wpha6ZlE8ayXkef1fh602r2WwvfMXtMdLlkfnLFdYYwYso+bWqm7yJqHXZGw2nrS5ZanSYnWlxBxMF1V940K2wdrI7R6OYf7DGGamMmTSbRhlS45xmVOumF1EyPCmHrrN8wwZOOrdNtLeMtzFzDlWnfTBxMk2NaXIZHBYxYLD4w8yju0ao65Vz1OIXoS9dLanwCe1PWrYuWMqf1if1z2k2yYfKJ741PDgno1ZQ8DRqvUny3mNoWTzGO6m1DkrJI8JiR5cSd+vZdGOO8nrMoc5+NDUFsMSXaZJeNlMmGLtJsovOsUp7I9S5VojKxF6bTVEelXqlfJobQr3LozSh2Jk7VcrVMfhXqszGWMzNqGhqZY0OadxkyyMssKugZR0KNFXBHlqwmJgTE/BNVMk6ItJXZMR0H47GpXv/DMOvNkmVuaV1PRfEdxuqc7Hcd+ZV/zTLaRxWk0nl9CdCeM6mn5rstHIBcpiuwmUZXeq81DacHI2rmrZ5SuE5mOZd6LQrZg9mx32TprA8BMo5jKN6yLTCi3WzQaZSuhzTtM1fUTGVpG8Tw+KXI0tjEpiWxtLYynOlktSbVlaI5kxP8TDH8kx50xoxi5KcA4pcja8KWLRlO/Ks6q06ergnvm1ca3Tq8Uw7LTUsmWyctXPWmpitl/uvGcWTGXGuAXDfhqazGmjkxcJW5hMMMMpYsXl2TZYtVOddG3XCarUt6Ptq9CZXSNzyuRzqRZOjsxdBbFVz6OA5HI43r1jityVlVpVkxmOsyaYWE1NTGq1sOVh36mHMcxtSvcy70edG0ZGR3I1Go1GRlV7mWWo1G0ZGRqlvH40l7o4m5xMWLLLYyNjnqc8556mdPqLJ31n/1nWOncxzG1tizrHs/Z+d2vP/B/l8wdJ6rHUn2nbbDq4p6htFtYzMMMTaZis1K5GKzGNmxhmUx2DDlZ/qNnIx41xnaMfCZWYaZWtNLTNW8ND4Fw1MyZOCdM428suKG1ehW8TesOydg7J+YYcD4cYR+8dFK6M4E3HM9ZfRNNL+Sn6rsl4DsrDl2HpPCnfxjGXtbZtYys1ttlyJ4T+BvexjGWRjMszK4Jpc77D3GyuVD7q0+G8m9G+2+rGm7cOR2y7FdtY2XUYx/oNlfRYxhMYyYZkyyg55enna9Kt/FFi6GMMwYwdwxWgxGMLKYmUyGExTKMZkMFhkymKuh0NOBNnBu+23LdwDoZYYzGGMxtORaTU1pjTGWTTGGtMrNWUsyyTTLLG1qy2ZjbK2DBllWqxMtBMaYZQmcE7zvvRcTkclUwdkxTaSdyySt/7fpL+T1v516Ji97fwr5JbLu305zMn5+GMTTZ9F+y7ExwmGVfG44yxn3dLv6l5i+Wth1jCrDq21nW9LqvvDzz3Vf3LLH/O/32TJ/erx3bXftO4eF+G956D952K/An4NfvOpjFjExjevP/UmE0fIoZXx6/w6lX/no3D0bLt+ixjieBM6ksRd0yB4Lt2SwYNE+gd1detlZWUnpiZfGfFaK+4PyCa/v18V8X75pe9fLXzp7l3VjF76vWZmHwGz1IZNWT7b8yddJ4q5kyrVdfru6atWc7bVYztL9Jf4GXvT+Y8m9/YsXP6H018a8D4XVOqvfzqeR+6yZOD8dPv0+U7/q5Pl+2dNb0MjzGVH5p6MNQ7cOWvw62U9aHE8DprDek+McLyvDz+te+9Zhq5+YTruufMcWMabqysTmZVWjKPfnK0wyVcrsuhjZRdLkHNvD72b9abriOSGIxiLixMOoalNPXzy+wT/tf+U6HHONfsz+xe8ufHBdQWWGWLA9if0rsnmrxK5LvRZQeWsTCsrmOYy8VteVfuRfcVTtDLItLIsMYxZLdU/DbtSemxF6Z6Zo5WBXE4tFdCyVMMXMTEMZXVlS6Xec2T4e0tHsRcEuWshcJ2YsNF5rUx1E8ifCq6Z+ZP7qdCeu/aTwFd53l16/o0NOw6O3dLavP4Hbi4RdmuDk6DoYaninC0+o4uZjbJ7Rxeu0/FbuFg+q7DVS6fQe0rZ6NDGUNNU6DEqOaLTicKnYZMnBWruljQxoaS3dZhocDge0bSTyOvdAbG5hxe2xji7E/L55xX13wWNDi6HCekcFxfCPGxY0MXC+s7afWaMdDyjyr+o8Rudm/NabOZvdl274zH4f5XK9z6On1Pe/K5TdPAslg77BjuO6Y3eO7GqvOPG/stknp1leyvLL0Z7bl9I4noMvLkzytLhWYzrOZzLXCORe028rORzOg4N/L0HlMOQ3Pgmnbb6KczlabORpu980q37TBqRu0/p3PO6234Bl03Ynuz+9W7gnsEcmvYaYY3aMYY0wx3pYd+ujsXauWdaY5Xkbtl23fPzFHiDB/QMo0yFjBllYxTQYYyxkrwn7JufwJ/PfgJ+C83X69ni6zvXcnyXabv0ncbLwsceS+RNlyN2mnneJtX0ngYO0+e+0+UnA+Wch3ji8hj5an4h+i6XBySU4n+R0roVcbw5yvHrmr4Yw8Y7x6c+9POPYHI5HI5HI5HI5HGXGww4nE4nrVyOR8XeqPEO7PLOiukYa3Novk5hV4cdtYZLI93e+uxff2jRo0aNGjRo0aNG1bVtW1dy3m83m8+tQ5ZzHw3nObwOu8La9Rc1dtkdS8A3eTk823tnktXWlxN6Oixe06zrN70Isd9jiOgZFq9yfkPqP/SLhN2Myl8jDM43bl1nbcb4cO57jlh8Jow6pzXZdL4dyODTuuhu77FyO27DdwdRxmvO+O+3N2+BdqyTwLHVczDVY4UPE4O66/ZO2cx1LFzVdSXtF7G4HMbrauOHRw6c8FdZ5m9fHZHYZXfTlZquyynSyTTKke6vcffSD9pzPA/G7n7jxPmuhc1DHMynPMrGL6AdewYmwu5ko+UUyTwrMv27rPH1v1nGqd87+p6N6LU8k3NEng53xXyHS97+44OSg/sy/hn+Se6yfYNjW0/uTgP+PvWYzLMmjhcLB/gGpri6H83/84eUXWT6T9Hsv7785z/7z4icpW+zfXypuR7rx/gMdZb1/wC678pcs8/2a3mDitGHxl9mfPlll5MafWWqxk/eYuTDgcNMzDGWLWvsuglNxs53GtN6uWpktlW1tZZYcuinMMWmnNnJydze3b2Y1McBxrBkXw799izLMZZYyy0TkbsGM4p03S2uVu5s/XXUdSdec6smVxZYYGpVmT8A+8ajuEyV5FatkvVru2x6uxGXXbH4A+jvgP4GMYy3iPLXzq/6z65+E005ey+cwMZD3fZcqc6xpjTFjQ0P3U+e++cPYmTIwj0nrK5NPTfl3WvpfLtXDcb2HQMudYOxFXQBor4L4T6vrOauFctYXJQ++NUWmJe5bmx1jDiZS1dTqWxo4GR8jm3fttpmPHppk9PEyv4/y8/sO07XacOmcqc0x2Vi9BvNJvN5oW8x4mOsydpidRxMYJPx06m1bqPzq9KtK8sxXNXFodD/+MYYaJTLwOhc9brCsV18oOR1i4tXChyTkq4lf4y1Ke+9axjDHqs1mfBbMXuP4Hzi+X7t8vzv7bHerrUPgPCxhjre4fXdfLNtNM+Jd+Zdh8xd8wP87uNPoPgv4W7/5P2BuxfsMabNnMnza+54Pdi5U671GPZY8CehX8Voeoo7FHpkeEc6715FwHZrIrUrHaviPUbPZHND+IhczrP6FcYvhOZ0Di/ETt0OI+YwNWR9r7tpf6WDeZKZDB1+z2IthOl1mPyb5FluvEx9h9d0NnM0Y1XPFkWIsk1WotJ0PBMmkvjvQTd0e71tfeV+8r8lQ/tpzpsmxJ+InrI/dj2UajUajVTUajatRqNRtGo1Go1Go4wjeMpZFMVV9CHbofPraLsJ3JpWV2XOoanCuFky4y3PPNxucK2uKC1Lbdb1eo+m5XomN6HfeZsabHLHRX/K+offtNGGmHWctcVcG44MdSqsOLY9VzX+Zxfxn2HPdWTpzWvkrtJ8M5zorrKcquRytJ5N5DZmcaW02l76nWO+BqPXm1A2Ry/0q71dH/mqrqeFjkYxjEXtsX8qubTk67rGycyqsdm4tZx5D6D5hhi0waaWmiaMP81Yjii5qxPlPuU/GfTL1Y5E6Jyfiq63qTa39A4J0sOGDgO9WF9bOXl0XfPRbsY2bPNKPy1YrFYrFYmRhhlTIyMjJWJYZHXuCXI8OoXsvfljGLFicNifpp2XunoPiG1wtx3p1Tah+/DD66OnVtVXP9rKbVxOnL0tR/rHtqB5UDErUVcl11D4qqvjpOcxX7armUNJB3LpW6bxVvD08e8h3odKKvyCFZBdSh2FVcST9xV3n3T8t1j7Kr9qgrqXg+13Pt5U7JCvFXVIV1YG5lRhkVYZJYYDDD4KOIMoHCp26WS8GB7uBh2zIdgq/PKyInjV2STShuoapUdCpX1yTwqq/z1VvET7Kh5nVPkO8YyxjLt2MaaMmWTLQvx3qnzltnXW0p2jxgbEtSny/Osv8Y9pLMXYoHVPAhkVdWVeODhR6q9/Sxe2liwwZWMVvFXfRkeIDxAePUPIrdJ4ey6yquzH+PD/bUOWAu05qVHtFd8rrKHSoeNIOUqrYr3FXyToqfYJgwmJdKpXXOwYYegNNGMzfZPp/t3t/DVs4zjNTN61rRqaWaa4NYbRjTa0tWwy2Y2tGN8ZO8ofNKq4j9SL7I+cSm4/6ovLV5HNXLI0jJidwrtk6ynCaP6Z++GjRlWS3tLeW129Mi9evxU9mtz6s5J3Z7M2ngTgnKvmpomxpaLCzPfmx0JWE+m3NLDDGOX47RctdYYNK5jakdqLkRlI39n590T5zctGSwwZZDJj6kW8XSi6ot2MmWWJ0DUT3nuvebBudScjZ79g8cWJ8av0k+/bE5WKd5MdbFpbDVMxu1DVMmtNZGJvq1mtRbn6M+g/kP0FwDwr7quZs7xosNGpbscyxhhd9TyJyFwbLcxlTasg75vW7TsV5K7ji44XPMMrdoj+Y3rT0Hie62nlYV/pwczzOmdLqLhYkzGMzCZWGMQzGMSsZYY6Di1t4nlJ+Em63mJxrVLxPbYxNEdgc1dU2iOKyoYYWjNrEeHTYybVk0atSa7ehuwsWMWTqn1TrnS6hYsi71d1+s+k+ic70e20fzE/VaTdxT9ZtU4GIXdeNx3X77guYYfpHeTQjaMX6brOu4OY4K7Y2d9mbHarI5ox3p4GpJ2Vd/Tst60f7j999pppjR+Q/Qf8J/VaORs3cji7FfFuN61+ui9s8hix1OCh5KGVV23BPXvZfz3CLyHpix+exi8z/KnCnosY2eunor+cxyPO/xJ0vKey9OvE9VjqaYu0x3Z3jd6o2b1T12D+F8l232lwaaacD5LE8LBxu7WTlbWraWpew8Xexjel3E+wWD4APITdNqR8F3R3T0lunCQ4GaE9R37DxeCYfcHi4xci5ovKfxVs55y2hf+65E/Xdp6jR5nrebTmi5incpkyOjs50JvrZwstbbW6kfuuQw+2mykf/EXNFzxfKTrxew929TR6bWnGL//F3JFOFCQT3K4lQ"
28
 
29
  kernels = Kernel(
 
122
 
123
 
124
  class QuantizedLinear(torch.nn.Module):
125
+ def __init__(self, weight_bit_width: int, weight, bias=None, device="cpu", dtype=None, empty_init=False, *args,
126
+ **kwargs):
127
  super().__init__()
 
 
 
128
  self.weight_bit_width = weight_bit_width
129
+
130
  shape = weight.shape
131
 
132
  if weight is None or empty_init:
 
154
  for layer in model.layers:
155
  layer.self_attention.query_key_value = QuantizedLinear(
156
  weight_bit_width=weight_bit_width,
157
+ weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
158
  bias=layer.self_attention.query_key_value.bias,
159
  dtype=layer.self_attention.query_key_value.weight.dtype,
160
  device=layer.self_attention.query_key_value.weight.device if device is None else device,
 
162
  )
163
  layer.self_attention.dense = QuantizedLinear(
164
  weight_bit_width=weight_bit_width,
165
+ weight=layer.self_attention.dense.weight.to(torch.cuda.current_device()),
166
  bias=layer.self_attention.dense.bias,
167
  dtype=layer.self_attention.dense.weight.dtype,
168
  device=layer.self_attention.dense.weight.device if device is None else device,
 
170
  )
171
  layer.mlp.dense_h_to_4h = QuantizedLinear(
172
  weight_bit_width=weight_bit_width,
173
+ weight=layer.mlp.dense_h_to_4h.weight.to(torch.cuda.current_device()),
174
  bias=layer.mlp.dense_h_to_4h.bias,
175
  dtype=layer.mlp.dense_h_to_4h.weight.dtype,
176
  device=layer.mlp.dense_h_to_4h.weight.device if device is None else device,
 
178
  )
179
  layer.mlp.dense_4h_to_h = QuantizedLinear(
180
  weight_bit_width=weight_bit_width,
181
+ weight=layer.mlp.dense_4h_to_h.weight.to(torch.cuda.current_device()),
182
  bias=layer.mlp.dense_4h_to_h.bias,
183
  dtype=layer.mlp.dense_4h_to_h.weight.dtype,
184
  device=layer.mlp.dense_4h_to_h.weight.device if device is None else device,
special_tokens_map.json DELETED
@@ -1 +0,0 @@
1
- {}
 
 
tokenization_chatglm.py CHANGED
@@ -1,6 +1,6 @@
1
  import json
2
  import os
3
- import re
4
  from typing import List, Optional, Union, Dict
5
  from sentencepiece import SentencePieceProcessor
6
  from transformers import PreTrainedTokenizer
@@ -8,9 +8,6 @@ from transformers.utils import logging, PaddingStrategy
8
  from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
9
 
10
 
11
- logger = logging.get_logger(__name__)
12
-
13
-
14
  class SPTokenizer:
15
  def __init__(self, model_path: str):
16
  # reload tokenizer
@@ -24,30 +21,17 @@ class SPTokenizer:
24
  self.pad_id: int = self.sp_model.unk_id()
25
  assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
26
 
27
- role_special_tokens = ["<|system|>", "<|user|>", "<|assistant|>", "<|observation|>"]
28
- special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop"] + role_special_tokens
29
  self.special_tokens = {}
30
  self.index_special_tokens = {}
31
  for token in special_tokens:
32
  self.special_tokens[token] = self.n_words
33
  self.index_special_tokens[self.n_words] = token
34
  self.n_words += 1
35
- self.role_special_token_expression = "|".join([re.escape(token) for token in special_tokens]) # for apply_chat_template
36
-
37
- def tokenize(self, s: str, encode_special_tokens=False):
38
- if encode_special_tokens:
39
- last_index = 0
40
- t = []
41
- for match in re.finditer(self.role_special_token_expression, s):
42
- if last_index < match.start():
43
- t.extend(self.sp_model.EncodeAsPieces(s[last_index:match.start()]))
44
- t.append(s[match.start():match.end()])
45
- last_index = match.end()
46
- if last_index < len(s):
47
- t.extend(self.sp_model.EncodeAsPieces(s[last_index:]))
48
- return t
49
- else:
50
- return self.sp_model.EncodeAsPieces(s)
51
 
52
  def encode(self, s: str, bos: bool = False, eos: bool = False) -> List[int]:
53
  assert type(s) is str
@@ -86,40 +70,27 @@ class SPTokenizer:
86
  """Converts an index (integer) in a token (str) using the vocab."""
87
  if index in self.index_special_tokens:
88
  return self.index_special_tokens[index]
89
- if index in [self.eos_id, self.bos_id, self.pad_id] or index < 0 or index > self.sp_model.vocab_size():
90
  return ""
91
  return self.sp_model.IdToPiece(index)
92
 
93
 
94
  class ChatGLMTokenizer(PreTrainedTokenizer):
95
-
96
  vocab_files_names = {"vocab_file": "tokenizer.model"}
 
97
  model_input_names = ["input_ids", "attention_mask", "position_ids"]
98
 
99
- def __init__(
100
- self,
101
- vocab_file,
102
- padding_side="left",
103
- clean_up_tokenization_spaces=False,
104
- encode_special_tokens=False,
105
- **kwargs
106
- ):
107
  self.name = "GLMTokenizer"
 
108
  self.vocab_file = vocab_file
109
  self.tokenizer = SPTokenizer(vocab_file)
110
  self.special_tokens = {
111
  "<bos>": self.tokenizer.bos_id,
112
  "<eos>": self.tokenizer.eos_id,
113
- "<unk>": self.tokenizer.pad_id,
114
  "<pad>": self.tokenizer.pad_id
115
  }
116
- self.encode_special_tokens = encode_special_tokens
117
-
118
- super().__init__(
119
- padding_side=padding_side,
120
- clean_up_tokenization_spaces=clean_up_tokenization_spaces,
121
- **kwargs
122
- )
123
 
124
  def get_command(self, token):
125
  if token in self.special_tokens:
@@ -129,40 +100,24 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
129
 
130
  @property
131
  def unk_token(self) -> str:
132
- return self.tokenizer.sp_model.IdToPiece(self.get_command("<unk>"))
133
 
134
  @property
135
  def pad_token(self) -> str:
136
- return self.tokenizer.sp_model.IdToPiece(self.get_command("<pad>"))
137
 
138
  @property
139
- def eos_token(self) -> str:
140
- return self.tokenizer.sp_model.IdToPiece(self.get_command("<eos>"))
141
-
142
- @property
143
- def unk_token_id(self) -> int:
144
- return self.get_command("<unk>")
145
 
146
  @property
147
- def pad_token_id(self) -> int:
148
- return self.get_command("<pad>")
149
 
150
  @property
151
  def eos_token_id(self):
152
  return self.get_command("<eos>")
153
 
154
- @unk_token.setter
155
- def unk_token(self, value):
156
- logger.warning("Setting unk_token is not supported, use the default one.")
157
-
158
- @pad_token.setter
159
- def pad_token(self, value):
160
- logger.warning("Setting pad_token is not supported, use the default one.")
161
-
162
- @eos_token.setter
163
- def eos_token(self, value):
164
- logger.warning("Setting eos_token is not supported, use the default one.")
165
-
166
  @property
167
  def vocab_size(self):
168
  return self.tokenizer.n_words
@@ -174,7 +129,7 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
174
  return vocab
175
 
176
  def _tokenize(self, text, **kwargs):
177
- return self.tokenizer.tokenize(text, encode_special_tokens=self.encode_special_tokens)
178
 
179
  def _convert_token_to_id(self, token):
180
  """ Converts a token (str) in an id using the vocab. """
@@ -240,7 +195,7 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
240
  return self.batch_encode_plus([input_ids], return_tensors="pt", is_split_into_words=True)
241
 
242
  def build_inputs_with_special_tokens(
243
- self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
244
  ) -> List[int]:
245
  """
246
  Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
@@ -265,14 +220,12 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
265
  return token_ids_0
266
 
267
  def _pad(
268
- self,
269
- encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
270
- max_length: Optional[int] = None,
271
- padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
272
- pad_to_multiple_of: Optional[int] = None,
273
- return_attention_mask: Optional[bool] = None,
274
- padding_side: Optional[bool] = None,
275
- **kwargs
276
  ) -> dict:
277
  """
278
  Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
@@ -299,7 +252,6 @@ class ChatGLMTokenizer(PreTrainedTokenizer):
299
  """
300
  # Load from model defaults
301
  assert self.padding_side == "left"
302
- assert padding_side is None or padding_side == "left"
303
 
304
  required_input = encoded_inputs[self.model_input_names[0]]
305
  seq_length = len(required_input)
 
1
  import json
2
  import os
3
+ import torch
4
  from typing import List, Optional, Union, Dict
5
  from sentencepiece import SentencePieceProcessor
6
  from transformers import PreTrainedTokenizer
 
8
  from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
9
 
10
 
 
 
 
11
  class SPTokenizer:
12
  def __init__(self, model_path: str):
13
  # reload tokenizer
 
21
  self.pad_id: int = self.sp_model.unk_id()
22
  assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
23
 
24
+ special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop", "<|system|>", "<|user|>", "<|assistant|>",
25
+ "<|observation|>"]
26
  self.special_tokens = {}
27
  self.index_special_tokens = {}
28
  for token in special_tokens:
29
  self.special_tokens[token] = self.n_words
30
  self.index_special_tokens[self.n_words] = token
31
  self.n_words += 1
32
+
33
+ def tokenize(self, s: str):
34
+ return self.sp_model.EncodeAsPieces(s)
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  def encode(self, s: str, bos: bool = False, eos: bool = False) -> List[int]:
37
  assert type(s) is str
 
70
  """Converts an index (integer) in a token (str) using the vocab."""
71
  if index in self.index_special_tokens:
72
  return self.index_special_tokens[index]
73
+ if index in [self.eos_id, self.bos_id, self.pad_id] or index < 0:
74
  return ""
75
  return self.sp_model.IdToPiece(index)
76
 
77
 
78
  class ChatGLMTokenizer(PreTrainedTokenizer):
 
79
  vocab_files_names = {"vocab_file": "tokenizer.model"}
80
+
81
  model_input_names = ["input_ids", "attention_mask", "position_ids"]
82
 
83
+ def __init__(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs):
 
 
 
 
 
 
 
84
  self.name = "GLMTokenizer"
85
+
86
  self.vocab_file = vocab_file
87
  self.tokenizer = SPTokenizer(vocab_file)
88
  self.special_tokens = {
89
  "<bos>": self.tokenizer.bos_id,
90
  "<eos>": self.tokenizer.eos_id,
 
91
  "<pad>": self.tokenizer.pad_id
92
  }
93
+ super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
 
 
 
 
 
 
94
 
95
  def get_command(self, token):
96
  if token in self.special_tokens:
 
100
 
101
  @property
102
  def unk_token(self) -> str:
103
+ return "<unk>"
104
 
105
  @property
106
  def pad_token(self) -> str:
107
+ return "<unk>"
108
 
109
  @property
110
+ def pad_token_id(self):
111
+ return self.get_command("<pad>")
 
 
 
 
112
 
113
  @property
114
+ def eos_token(self) -> str:
115
+ return "</s>"
116
 
117
  @property
118
  def eos_token_id(self):
119
  return self.get_command("<eos>")
120
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  @property
122
  def vocab_size(self):
123
  return self.tokenizer.n_words
 
129
  return vocab
130
 
131
  def _tokenize(self, text, **kwargs):
132
+ return self.tokenizer.tokenize(text)
133
 
134
  def _convert_token_to_id(self, token):
135
  """ Converts a token (str) in an id using the vocab. """
 
195
  return self.batch_encode_plus([input_ids], return_tensors="pt", is_split_into_words=True)
196
 
197
  def build_inputs_with_special_tokens(
198
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
199
  ) -> List[int]:
200
  """
201
  Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
 
220
  return token_ids_0
221
 
222
  def _pad(
223
+ self,
224
+ encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
225
+ max_length: Optional[int] = None,
226
+ padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
227
+ pad_to_multiple_of: Optional[int] = None,
228
+ return_attention_mask: Optional[bool] = None,
 
 
229
  ) -> dict:
230
  """
231
  Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
 
252
  """
253
  # Load from model defaults
254
  assert self.padding_side == "left"
 
255
 
256
  required_input = encoded_inputs[self.model_input_names[0]]
257
  seq_length = len(required_input)
tokenizer_config.json CHANGED
@@ -1,52 +1,12 @@
1
  {
2
- "added_tokens_decoder": {
3
- "64790": {
4
- "content": "[gMASK]",
5
- "lstrip": false,
6
- "normalized": true,
7
- "rstrip": false,
8
- "single_word": false,
9
- "special": false
10
- },
11
- "64792": {
12
- "content": "sop",
13
- "lstrip": false,
14
- "normalized": true,
15
- "rstrip": false,
16
- "single_word": false,
17
- "special": false
18
- },
19
- "64795": {
20
- "content": "<|user|>",
21
- "lstrip": false,
22
- "normalized": true,
23
- "rstrip": false,
24
- "single_word": false,
25
- "special": false
26
- },
27
- "64796": {
28
- "content": "<|assistant|>",
29
- "lstrip": false,
30
- "normalized": true,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": false
34
- }
35
- },
36
  "auto_map": {
37
  "AutoTokenizer": [
38
  "tokenization_chatglm.ChatGLMTokenizer",
39
  null
40
- ]
41
- },
42
- "chat_template": "{% for message in messages %}{% if loop.first %}[gMASK]sop<|{{ message['role'] }}|>\n {{ message['content'] }}{% else %}<|{{ message['role'] }}|>\n {{ message['content'] }}{% endif %}{% endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}",
43
- "clean_up_tokenization_spaces": false,
44
- "do_lower_case": false,
45
- "eos_token": "</s>",
46
- "model_max_length": 1000000000000000019884624838656,
47
- "pad_token": "<unk>",
48
- "padding_side": "left",
49
- "remove_space": false,
50
- "tokenizer_class": "ChatGLMTokenizer",
51
- "unk_token": "<unk>"
52
  }
 
1
  {
2
+ "name_or_path": "THUDM/chatglm3-6b",
3
+ "remove_space": false,
4
+ "do_lower_case": false,
5
+ "tokenizer_class": "ChatGLMTokenizer",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  "auto_map": {
7
  "AutoTokenizer": [
8
  "tokenization_chatglm.ChatGLMTokenizer",
9
  null
10
+ ]
11
+ }
 
 
 
 
 
 
 
 
 
 
12
  }