upload auto_round format

Browse files

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

Files changed (7) hide show

.gitattributes +2 -0
README.md +115 -57
config.json +2 -2
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +3 -0
quantization_config.json +3 -0

.gitattributes CHANGED Viewed

@@ -41,3 +41,5 @@ special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 vocab.json filter=lfs diff=lfs merge=lfs -text

 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 vocab.json filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+quantization_config.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -3,22 +3,18 @@ license: apache-2.0
 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
-This model is an int4 model with group_size 128 with quantized lm-head of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
 ## How To Use
-### INT4 Inference
 ```python
-##git clone https://github.com/intel/auto-round.git
-##cd auto-round && pip install -vvv --no-build-isolation -e .
-from auto_round import AutoHfQuantizer ##must import
-import torch
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-14B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +23,7 @@ model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +45,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     model_inputs.input_ids,
-    max_new_tokens=50,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
@@ -58,76 +55,140 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
-##prompt = "There is a girl who likes adventure,"
-##That's great! Adventure can be a wonderful way to explore the world, challenge oneself, and discover new things. What kind of adventures does she enjoy? Perhaps she likes hiking, traveling to new places, trying new activities, or maybe something else entirely
-##prompt = "Which one is bigger, 9.11 or 9.8"
-##To determine which number is larger between 9.11 and 9.8, you can compare the digits in each place value:
-##- The whole number part of both numbers is 9.
-##- For the decimal part:
-##  - 9
-##prompt = "Once upon a time,"
-##Once upon a time, in a far-off land, there was a kingdom surrounded by lush green forests, sparkling rivers, and rolling hills. The people of this kingdom lived in harmony with nature and each other, under the wise rule of their king.
-##prompt = "请介绍一下阿里巴巴公司"
-##阿里巴巴集团创立于1999年，是以贸易作为发展起点，以数据作为核心驱动，并以技术作为基础支撑的公司。阿里巴巴集团业务包括核心电商、云计算、数字媒体及娱乐、创新项目四大板块。阿里巴巴
-```
-### Evaluate the model
-pip3 install lm-eval==0.4.2
-```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round --model "OPEA/Qwen2.5-7B-Instruct-int4-inc" --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
 ```
-| Metric         | BF16   | INT4   |
-|:-------------- | :----: | :----: |
-| Avg            | 0.7271 | 0.7221 |
-| mmlu           | 0.7891 | 0.7812 |
-| cmmlu          | 0.8378 | 0.8257 |
-| ceval-valid    | 0.8351 | 0.8276 |
-| lambada_openai | 0.7343 | 0.7227 |
-| hellaswag      | 0.6562 | 0.6509 |
-| winogrande     | 0.7616 | 0.7585 |
-| piqa           | 0.8139 | 0.8128 |
-| truthfulqa_mc1 | 0.5153 | 0.5116 |
-| openbookqa     | 0.3700 | 0.3620 |
-| boolq          | 0.8801 | 0.8801 |
-| arc_easy       | 0.8573 | 0.8548 |
-| arc_challenge  | 0.6067 | 0.6084 |
-| gsm8k 5 shots  | 0.7953 | 0.7908 |
-### Reproduce the model
-Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
 ```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round \
---model_name  Qwen/Qwen2.5-14B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
---model_dtype "float16" \
---format 'auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -140,15 +201,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Here are a couple of useful links to learn more about Intel's AI software:
-* Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
-* Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
+This model is an int4 model with group_size 128 and symmetric quantization of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="98b3137"` to use AutoGPTQ format.
 ## How To Use
+### INT4 Inference(CPU/HPU/CUDA)
+CPU requires auto-round version>0.3.1
 ```python
+from auto_round import AutoRoundConfig ##must import for auto-round format
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-14B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
+    ##revision="f86a564" ##AutoGPTQ format
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 generated_ids = model.generate(
     model_inputs.input_ids,
+    max_new_tokens=200,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
+prompt = "There is a girl who likes adventure,"
+##INT4:
+""" and she wants to go on a trip. She has 10 different types of snacks, and she can only carry 4 of them in her bag. How many different combinations of snacks can she choose from? To determine the number of different combinations of snacks the girl can choose from, we need to calculate the number of ways to choose 4 snacks out of 10. This is a classic combination problem where the order of selection does not matter.
+The formula for combinations is given by:
+\[
+\binom{n}{r} = \frac{n!}{r!(n-r)!}
+\]
+where \( n \) is the total number of items to choose from, \( r \) is the number of items to choose, and \( ! \) denotes factorial.
+In this problem, \( n = 10 \) and \( r = 4 \). Plugging these values into the formula, we get:
+\[
+\binom{10}{4}"""
+##BF16:
+""" and she has a hobby of collecting rocks. She wants to go on a trip to collect some unique rocks. She plans to visit three different locations: a mountain, a beach, and a desert. Each location has its own set of challenges and opportunities for rock collecting.
+1. The mountain is known for its rare mineral deposits, but the terrain is steep and rocky, making it difficult to navigate.
+2. The beach offers a variety of sedimentary rocks and fossils, but the tides can be unpredictable and dangerous.
+3. The desert provides an opportunity to find petrified wood and volcanic rocks, but the heat and lack of water can be challenging.
+The girl has a backpack that can carry up to 10 kilograms of rocks. She also has a map that shows the locations of specific types of rocks at each site. Her goal is to maximize the number of unique rock types she collects while staying within her weight limit.
+Given the following information:
+- Mountain: 5 unique rock types"""
+prompt = "9.11和9.8哪个数字大"
+#INT4:
+"""？ 9.11 比 9.8 大。
+为了比较这两个数，我们可以从它们的小数部分开始：
+- 9.11 可以看作是 9 + 0.11
+- 9.8 可以看作是 9 + 0.8
+由于 0.11 小于 0.8，所以 9.11 小于 9.8。因此，9.8 比 9.11 大。
+总结：9.8 > 9.11。所以，9.8 是较大的数字。如果你的问题是问哪个数字较大，则答案是 9.8。如果问题是问哪个数字较小，则答案是 9.11。请确认你的问题需求。根据你的描述，9.8 是较大的数字。
+希望这能帮助你！如有其他问题，请随时提问。
+（注意：在"""
+##BF16:
+"""？ 9.11 比 9.8 大。
+在比较两个小数时，我们从左到右逐位进行比较。首先比较整数部分，如果相同，则比较小数部分。对于9.11 和 9.8：
+- 整数部分都是9，相等。
+- 比较小数部分：0.11 和 0.8。
+由于0.11 < 0.8，所以9.11 < 9.8。
+因此，9.8 比 9.11 大。
+所以，正确的答案是：9.8 比 9.11 大。
+希望这能帮助你理解！如果你有其他问题，请随时告诉我。
+总结：9.8 > 9.11。
+希望这个解释清楚了你的疑问。如果有任何进一步的问题或需要更多帮助，请告诉我！
+再次确认：9"""
+prompt = "Once upon a time,"
+##INT4:
+""" there was a young man named John who had a passion for music. He loved playing the guitar and would spend hours every day practicing and perfecting his skills. However, he struggled to find an audience for his music and felt discouraged.
+"""
+##BF16:
+""" there was a young man named John who lived in a small village. He was an orphan and had to work hard to make ends meet. Despite his difficult circumstances, he was kind-hearted and always willing to help others. One day, a wise old man came to the village and saw John's kindness. The old man decided to test John's character by giving him a bag of gold coins and telling him to distribute it among the villagers. John was overjoyed at first but then realized that he could use the money for himself. However, he remembered the wise man's words and distributed the coins equally among the villagers. The wise man was pleased with John's actions and revealed himself as a fairy godfather. He granted John three wishes, but with a twist - each wish would come true only if John could prove that he deserved it. What are some possible wishes that John might make and how could he prove that he deserves them?
+John, being a kind-hearted individual, might consider wishes that"""
+prompt = "请简短介绍一下阿里巴巴公司"
+##INT4:
+"""阿里巴巴集团创立于1999年，是全球领先的电子商务及零售贸易平台。阿里巴巴集团的使命是让世界各地的企业都能平等地进行贸易。阿里巴巴集团旗下的业务包括淘宝、天猫、菜鸟网络、阿里云等。阿里巴巴集团致力于通过技术创新，为中小企业提供更便捷、高效的商业服务，推动数字经济的发展。阿里巴巴集团在全球范围内拥有数百万商家和消费者用户，已成为全球最大的零售贸易平台之一。阿里巴巴集团总部位于中国杭州，并在全球范围内设有多个办事处和研发中心。阿里巴巴集团的愿景是构建一个开放、协作、可持续发展的数字经济生态系统，为全球商业带来更多的机遇和价值。阿里巴巴集团在2014年上市，成为当时全球最大的IPO。阿里巴巴集团的创始人马云是中国著名的企业家和慈善家。阿里巴巴集团在社会责任方面也做出了积极贡献，包括支持教育、环保、扶贫等公益事业。阿里巴巴集团是一家具有高度社会责任感的企业。阿里巴巴集团的业务涵盖了电子商务、金融、物流
+"""
+##BF16:
+"""阿里巴巴集团创立于1999年，是全球领先的电子商务及零售平台，业务涵盖B2B、C2C、B2C等各个领域。阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名子公司和品牌，致力于打造开放、协同、繁荣的商业生态系统，为全球中小企业提供一站式数字化转型服务。阿里巴巴在全球范围内拥有超过20万名员工，并在纽约证券交易所上市。阿里巴巴一直秉承“让天下没有难做的生意”的使命，不断创新和发展，成为全球领先的数字经济体之一。阿里巴巴还积极履行企业社会责任，关注环保、公益等领域，努力实现可持续发展。阿里巴巴已经成为中国互联网行业的领军企业之一，在全球范围内也具有广泛的影响力。阿里巴巴的发展历程充满了挑战与机遇，未来将继续引领数字经济的发展趋势，推动全球经济的繁荣与发展。阿里巴巴是一家总部位于中��杭州的跨国科技公司，主要业务包括电子商务、金融、物流、云计算等。阿里巴巴旗下的淘宝、天猫等电商平台已成为
+"""
 ```
+### Evaluate the model
+pip3 install lm-eval==0.4.5
+```bash
+auto-round --model "OPEA/Qwen2.5-14B-Instruct-int4-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
+```
+| Metric                                     |  BF16  |  INT4  |
+| :----------------------------------------- | :----: | :----: |
+| Avg                                        | 0.6947 | 0.6954 |
+| leaderboard_mmlu_pro 5 shots               | 0.5375 | 0.5292 |
+| leaderboard_ifeval inst_level_strict_acc   | 0.6331 | 0.6475 |
+| leaderboard_ifeval prompt_level_strict_acc | 0.5102 | 0.5287 |
+| mmlu                                       | 0.7882 | 0.7809 |
+| cmmlu                                      | 0.8377 | 0.8240 |
+| ceval-valid                                | 0.8351 | 0.8232 |
+| gsm8k 5 shots                              | 0.7900 | 0.8120 |
+| lambada_openai                             | 0.7283 | 0.7250 |
+| hellaswag                                  | 0.6556 | 0.6508 |
+| winogrande                                 | 0.7585 | 0.7672 |
+| piqa                                       | 0.8166 | 0.8156 |
+| truthfulqa_mc1                             | 0.5153 | 0.5202 |
+| openbookqa                                 | 0.3640 | 0.3700 |
+| boolq                                      | 0.8798 | 0.8810 |
+| arc_easy                                   | 0.8582 | 0.8535 |
+| arc_challenge                              | 0.6049 | 0.5981 |
+### Generate the model
+Here is the sample command to generate  the model.
 ```bash
+auto-round \
+--model  Qwen/Qwen2.5-14B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
+--model_dtype "fp16" \
+--format 'auto_gptq,auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 Here are a couple of useful links to learn more about Intel's AI software:
+- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

config.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:89dacb213381240bde9b7a008be9115f745592becf66e0ba94fac63d9b68a245
-size 1369

 version https://git-lfs.github.com/spec/v1
+oid sha256:e7e0a227e09ed0d35c07756beca5d58c4c98a5b20631dcea691bdfc3a75e5150
+size 1383

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26dd631cc8d5e743bf3af07d9f7c9b04873a6322057b0b48e7d7fc25dc70069f
+size 4994374488

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b77adb3a3b46488d3c117dfab884957c984212afe59b2112119c69ca30a795a
+size 4994385136

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:78b1e06ac5fbc1003903d8c34b017e73e8134cc6e2a7de32139094c6b75246a8
+size 129341

quantization_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f92209e21368ef298866e57e5f3838e7590119ba042ef4c15bf642f7f60e4f40
+size 575