OPEA
/

Safetensors
sys-lpot-val commited on
Commit
3a8f745
1 Parent(s): 98b3137

upload auto_round format

Browse files

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

.gitattributes CHANGED
@@ -41,3 +41,5 @@ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
41
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
42
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
  vocab.json filter=lfs diff=lfs merge=lfs -text
 
 
 
41
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
42
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
  vocab.json filter=lfs diff=lfs merge=lfs -text
44
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
45
+ quantization_config.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -3,22 +3,18 @@ license: apache-2.0
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
6
-
7
  ## Model Details
8
 
9
- This model is an int4 model with group_size 128 with quantized lm-head of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
10
 
11
  ## How To Use
12
 
13
- ### INT4 Inference
14
-
15
 
 
16
 
17
  ```python
18
- ##git clone https://github.com/intel/auto-round.git
19
- ##cd auto-round && pip install -vvv --no-build-isolation -e .
20
- from auto_round import AutoHfQuantizer ##must import
21
- import torch
22
  from transformers import AutoModelForCausalLM,AutoTokenizer
23
  quantized_model_dir = "OPEA/Qwen2.5-14B-Instruct-int4-inc"
24
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +23,7 @@ model = AutoModelForCausalLM.from_pretrained(
27
  quantized_model_dir,
28
  torch_dtype='auto',
29
  device_map="auto",
 
30
  )
31
 
32
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +45,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
48
 
49
  generated_ids = model.generate(
50
  model_inputs.input_ids,
51
- max_new_tokens=50, ##change this to align with the official usage
52
  do_sample=False ##change this to align with the official usage
53
  )
54
  generated_ids = [
@@ -58,76 +55,140 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
58
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
59
  print(response)
60
 
61
- ##prompt = "There is a girl who likes adventure,"
62
- ##That's great! Adventure can be a wonderful way to explore the world, challenge oneself, and discover new things. What kind of adventures does she enjoy? Perhaps she likes hiking, traveling to new places, trying new activities, or maybe something else entirely
 
63
 
64
- ##prompt = "Which one is bigger, 9.11 or 9.8"
65
- ##To determine which number is larger between 9.11 and 9.8, you can compare the digits in each place value:
66
- ##- The whole number part of both numbers is 9.
67
- ##- For the decimal part:
68
- ## - 9
69
 
70
- ##prompt = "Once upon a time,"
71
- ##Once upon a time, in a far-off land, there was a kingdom surrounded by lush green forests, sparkling rivers, and rolling hills. The people of this kingdom lived in harmony with nature and each other, under the wise rule of their king.
 
72
 
73
- ##prompt = "请介绍一下阿里巴巴公司"
74
- ##阿里巴巴集团创立于1999年,是以贸易作为发展起点,以数据作为核心驱动,并以技术作为基础支撑的公司。阿里巴巴集团业务包括核心电商、云计算、数字媒体及娱乐、创新项目四大板块。阿里巴巴
75
 
76
- ```
 
 
77
 
78
- ### Evaluate the model
79
 
80
- pip3 install lm-eval==0.4.2
 
81
 
82
- ```bash
83
- git clone https://github.com/intel/auto-round
84
- cd auto-round
85
- python -m auto_round --model "OPEA/Qwen2.5-7B-Instruct-int4-inc" --eval --eval_bs 16 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ```
87
 
88
- | Metric | BF16 | INT4 |
89
- |:-------------- | :----: | :----: |
90
- | Avg | 0.7271 | 0.7221 |
91
- | mmlu | 0.7891 | 0.7812 |
92
- | cmmlu | 0.8378 | 0.8257 |
93
- | ceval-valid | 0.8351 | 0.8276 |
94
- | lambada_openai | 0.7343 | 0.7227 |
95
- | hellaswag | 0.6562 | 0.6509 |
96
- | winogrande | 0.7616 | 0.7585 |
97
- | piqa | 0.8139 | 0.8128 |
98
- | truthfulqa_mc1 | 0.5153 | 0.5116 |
99
- | openbookqa | 0.3700 | 0.3620 |
100
- | boolq | 0.8801 | 0.8801 |
101
- | arc_easy | 0.8573 | 0.8548 |
102
- | arc_challenge | 0.6067 | 0.6084 |
103
- | gsm8k 5 shots | 0.7953 | 0.7908 |
104
 
 
 
 
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
 
108
 
109
- ### Reproduce the model
110
 
111
- Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
112
 
113
  ```bash
114
- git clone https://github.com/intel/auto-round
115
- cd auto-round
116
- python -m auto_round \
117
- --model_name Qwen/Qwen2.5-14B-Instruct \
118
  --device 0 \
119
  --group_size 128 \
120
  --nsamples 512 \
121
  --bits 4 \
122
  --iter 1000 \
123
  --disable_eval \
124
- --model_dtype "float16" \
125
- --format 'auto_round' \
126
  --output_dir "./tmp_autoround"
127
  ```
128
 
129
-
130
-
131
  ## Ethical Considerations and Limitations
132
 
133
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -140,15 +201,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
140
 
141
  Here are a couple of useful links to learn more about Intel's AI software:
142
 
143
- * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
144
- * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
145
 
146
  ## Disclaimer
147
 
148
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
149
 
150
-
151
-
152
  ## Cite
153
 
154
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
 
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
 
6
  ## Model Details
7
 
8
+ This model is an int4 model with group_size 128 and symmetric quantization of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="98b3137"` to use AutoGPTQ format.
9
 
10
  ## How To Use
11
 
12
+ ### INT4 Inference(CPU/HPU/CUDA)
 
13
 
14
+ CPU requires auto-round version>0.3.1
15
 
16
  ```python
17
+ from auto_round import AutoRoundConfig ##must import for auto-round format
 
 
 
18
  from transformers import AutoModelForCausalLM,AutoTokenizer
19
  quantized_model_dir = "OPEA/Qwen2.5-14B-Instruct-int4-inc"
20
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
 
23
  quantized_model_dir,
24
  torch_dtype='auto',
25
  device_map="auto",
26
+ ##revision="f86a564" ##AutoGPTQ format
27
  )
28
 
29
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 
45
 
46
  generated_ids = model.generate(
47
  model_inputs.input_ids,
48
+ max_new_tokens=200, ##change this to align with the official usage
49
  do_sample=False ##change this to align with the official usage
50
  )
51
  generated_ids = [
 
55
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
56
  print(response)
57
 
58
+ prompt = "There is a girl who likes adventure,"
59
+ ##INT4:
60
+ """ and she wants to go on a trip. She has 10 different types of snacks, and she can only carry 4 of them in her bag. How many different combinations of snacks can she choose from? To determine the number of different combinations of snacks the girl can choose from, we need to calculate the number of ways to choose 4 snacks out of 10. This is a classic combination problem where the order of selection does not matter.
61
 
62
+ The formula for combinations is given by:
63
+ \[
64
+ \binom{n}{r} = \frac{n!}{r!(n-r)!}
65
+ \]
66
+ where \( n \) is the total number of items to choose from, \( r \) is the number of items to choose, and \( ! \) denotes factorial.
67
 
68
+ In this problem, \( n = 10 \) and \( r = 4 \). Plugging these values into the formula, we get:
69
+ \[
70
+ \binom{10}{4}"""
71
 
72
+ ##BF16:
73
+ """ and she has a hobby of collecting rocks. She wants to go on a trip to collect some unique rocks. She plans to visit three different locations: a mountain, a beach, and a desert. Each location has its own set of challenges and opportunities for rock collecting.
74
 
75
+ 1. The mountain is known for its rare mineral deposits, but the terrain is steep and rocky, making it difficult to navigate.
76
+ 2. The beach offers a variety of sedimentary rocks and fossils, but the tides can be unpredictable and dangerous.
77
+ 3. The desert provides an opportunity to find petrified wood and volcanic rocks, but the heat and lack of water can be challenging.
78
 
79
+ The girl has a backpack that can carry up to 10 kilograms of rocks. She also has a map that shows the locations of specific types of rocks at each site. Her goal is to maximize the number of unique rock types she collects while staying within her weight limit.
80
 
81
+ Given the following information:
82
+ - Mountain: 5 unique rock types"""
83
 
84
+ prompt = "9.11和9.8哪个数字大"
85
+ #INT4:
86
+ """? 9.11 比 9.8 大。
87
+
88
+ 为了比较这两个数,我们可以从它们的小数部分开始:
89
+
90
+ - 9.11 可以看作是 9 + 0.11
91
+ - 9.8 可以看作是 9 + 0.8
92
+
93
+ 由于 0.11 小于 0.8,所以 9.11 小于 9.8。因此,9.8 比 9.11 大。
94
+
95
+ 总结:9.8 > 9.11。所以,9.8 是较大的数字。如果你的问题是问哪个数字较大,则答案是 9.8。如果问题是问哪个数字较小,则答案是 9.11。请确认你的问题需求。根据你的描述,9.8 是较大的数字。
96
+
97
+ 希望这能帮助你!如有其他问题,请随时提问。
98
+
99
+ (注意:在"""
100
+
101
+ ##BF16:
102
+ """? 9.11 比 9.8 大。
103
+
104
+ 在比较两个小数时,我们从左到右逐位进行比较。首先比较整数部分,如果相同,则比较小数部分。对于9.11 和 9.8:
105
+
106
+ - 整数部分都是9,相等。
107
+ - 比较小数部分:0.11 和 0.8。
108
+
109
+ 由于0.11 < 0.8,所以9.11 < 9.8。
110
+
111
+ 因此,9.8 比 9.11 大。
112
+
113
+ 所以,正确的答案是:9.8 比 9.11 大。
114
+
115
+ 希望这能帮助你理解!如果你有其他问题,请随时告诉我。
116
+
117
+ 总结:9.8 > 9.11。
118
+
119
+ 希望这个解释清楚了你的疑问。如果有任何进一步的问题或需要更多帮助,请告诉我!
120
+
121
+ 再次确认:9"""
122
+
123
+
124
+ prompt = "Once upon a time,"
125
+ ##INT4:
126
+ """ there was a young man named John who had a passion for music. He loved playing the guitar and would spend hours every day practicing and perfecting his skills. However, he struggled to find an audience for his music and felt discouraged.
127
+ """
128
+
129
+ ##BF16:
130
+ """ there was a young man named John who lived in a small village. He was an orphan and had to work hard to make ends meet. Despite his difficult circumstances, he was kind-hearted and always willing to help others. One day, a wise old man came to the village and saw John's kindness. The old man decided to test John's character by giving him a bag of gold coins and telling him to distribute it among the villagers. John was overjoyed at first but then realized that he could use the money for himself. However, he remembered the wise man's words and distributed the coins equally among the villagers. The wise man was pleased with John's actions and revealed himself as a fairy godfather. He granted John three wishes, but with a twist - each wish would come true only if John could prove that he deserved it. What are some possible wishes that John might make and how could he prove that he deserves them?
131
+ John, being a kind-hearted individual, might consider wishes that"""
132
+
133
+
134
+ prompt = "请简短介绍一下阿里巴巴公司"
135
+ ##INT4:
136
+ """阿里巴巴集团创立于1999年,是全球领先的电子商务及零售贸易平台。阿里巴巴集团的使命是让世界各地的企业都能平等地进行贸易。阿里巴巴集团旗下的业务包括淘宝、天猫、菜鸟网络、阿里云等。阿里巴巴集团致力于通过技术创新,为中小企业提供更便捷、高效的商业服务,推动数字经济的发展。阿里巴巴集团在全球范围内拥有数百万商家和消费者用户,已成为全球最大的零售贸易平台之一。阿里巴巴集团总部位于中国杭州,并在全球范围内设有多个办事处和研发中心。阿里巴巴集团的愿景是构建一个开放、协作、可持续发展的数字经济生态系统,为全球商业带来更多的机遇和价值。阿里巴巴集团在2014年上市,成为当时全球最大的IPO。阿里巴巴集团的创始人马云是中国著名的企业家和慈善家。阿里巴巴集团在社会责任方面也做出了积极贡献,包括支持教育、环保、扶贫等公益事业。阿里巴巴集团是一家具有高度社会责任感的企业。阿里巴巴集团的业务涵盖了电子商务、金融、物流
137
+ """
138
+
139
+ ##BF16:
140
+ """阿里巴巴集团创立于1999年,是全球领先的电子商务及零售平台,业务涵盖B2B、C2C、B2C等各个领域。阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名子公司和品牌,致力于打造开放、协同、繁荣的商业生态系统,为全球中小企业提供一站式数字化转型服务。阿里巴巴在全球范围内拥有超过20万名员工,并在纽约证券交易所上市。阿里巴巴一直秉承“让天下没有难做的生意”的使命,不断创新和发展,成为全球领先的数字经济体之一。阿里巴巴还积极履行企业社会责任,关注环保、公益等领域,努力实现可持续发展。阿里巴巴已经成为中国互联网行业的领军企业之一,在全球范围内也具有广泛的影响力。阿里巴巴的发展历程充满了挑战与机遇,未来将继续引领数字经济的发展趋势,推动全球经济的繁荣与发展。阿里巴巴是一家总部位于中��杭州的跨国科技公司,主要业务包括电子商务、金融、物流、云计算等。阿里巴巴旗下的淘宝、天猫等电商平台已成为
141
+ """
142
  ```
143
 
144
+ ### Evaluate the model
145
+
146
+ pip3 install lm-eval==0.4.5
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
+ ```bash
149
+ auto-round --model "OPEA/Qwen2.5-14B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
150
+ ```
151
 
152
+ | Metric | BF16 | INT4 |
153
+ | :----------------------------------------- | :----: | :----: |
154
+ | Avg | 0.6947 | 0.6954 |
155
+ | leaderboard_mmlu_pro 5 shots | 0.5375 | 0.5292 |
156
+ | leaderboard_ifeval inst_level_strict_acc | 0.6331 | 0.6475 |
157
+ | leaderboard_ifeval prompt_level_strict_acc | 0.5102 | 0.5287 |
158
+ | mmlu | 0.7882 | 0.7809 |
159
+ | cmmlu | 0.8377 | 0.8240 |
160
+ | ceval-valid | 0.8351 | 0.8232 |
161
+ | gsm8k 5 shots | 0.7900 | 0.8120 |
162
+ | lambada_openai | 0.7283 | 0.7250 |
163
+ | hellaswag | 0.6556 | 0.6508 |
164
+ | winogrande | 0.7585 | 0.7672 |
165
+ | piqa | 0.8166 | 0.8156 |
166
+ | truthfulqa_mc1 | 0.5153 | 0.5202 |
167
+ | openbookqa | 0.3640 | 0.3700 |
168
+ | boolq | 0.8798 | 0.8810 |
169
+ | arc_easy | 0.8582 | 0.8535 |
170
+ | arc_challenge | 0.6049 | 0.5981 |
171
 
172
 
173
 
174
+ ### Generate the model
175
 
176
+ Here is the sample command to generate the model.
177
 
178
  ```bash
179
+ auto-round \
180
+ --model Qwen/Qwen2.5-14B-Instruct \
 
 
181
  --device 0 \
182
  --group_size 128 \
183
  --nsamples 512 \
184
  --bits 4 \
185
  --iter 1000 \
186
  --disable_eval \
187
+ --model_dtype "fp16" \
188
+ --format 'auto_gptq,auto_round' \
189
  --output_dir "./tmp_autoround"
190
  ```
191
 
 
 
192
  ## Ethical Considerations and Limitations
193
 
194
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 
201
 
202
  Here are a couple of useful links to learn more about Intel's AI software:
203
 
204
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 
205
 
206
  ## Disclaimer
207
 
208
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
209
 
 
 
210
  ## Cite
211
 
212
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
config.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:89dacb213381240bde9b7a008be9115f745592becf66e0ba94fac63d9b68a245
3
- size 1369
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7e0a227e09ed0d35c07756beca5d58c4c98a5b20631dcea691bdfc3a75e5150
3
+ size 1383
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26dd631cc8d5e743bf3af07d9f7c9b04873a6322057b0b48e7d7fc25dc70069f
3
+ size 4994374488
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b77adb3a3b46488d3c117dfab884957c984212afe59b2112119c69ca30a795a
3
+ size 4994385136
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78b1e06ac5fbc1003903d8c34b017e73e8134cc6e2a7de32139094c6b75246a8
3
+ size 129341
quantization_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f92209e21368ef298866e57e5f3838e7590119ba042ef4c15bf642f7f60e4f40
3
+ size 575