OPEA
/

Safetensors

Model Details

This model is an int4 model with group_size 128 and and symmetric quantization of Qwen/Qwen2.5-72B-Instruct generated by intel/auto-round. Load the model with revision="b162b49" to use AutoGPTQ format.

How To Use

INT4 Inference(CPU/HPU/CUDA)

CPU requires auto-round version>0.3.1

from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype='auto',
    device_map="auto",
    ##revision="b162b49" ##AutoGPTQ format
)

##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU

prompt = "There is a girl who likes adventure,"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=200,  ##change this to align with the official usage
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

prompt = "There is a girl who likes adventure,"
##INT4:
"""That's great to hear! A love for adventure can lead to so many exciting experiences and personal growth. What kind of adventures does she enjoy? Is it exploring new places, trying out thrilling activities, or perhaps diving into new hobbies and challenges? Knowing more about her interests could help in suggesting fun ideas or planning future adventures.
"""

##BF16:
"""That sounds like a great quality to have! A love for adventure can lead to exciting experiences and personal growth. What kind of adventures does she enjoy? Does she like exploring new places, trying new activities, or seeking out thrilling experiences? Knowing more about her interests can help suggest specific ideas or activities that might appeal to her adventurous spirit.
"""

prompt = "9.11和9.8哪个数字大"  
#INT4: 
"""要比较9.11和9.8的大小,可以按照以下步骤进行:

1. **比较整数部分**:两个数字的整数部分都是9,所以需要进一步比较小数部分。
2. **比较小数部分**:
   - 9.11的小数部分是0.11
   - 9.8的小数部分是0.8

3. **比较0.11和0.8**:
   - 0.11可以写成0.110
   - 0.8可以写成0.800

4. **逐位比较**:
   - 第一位:1 < 8,所以0.110 < 0.800

因此,9.11 < 9.8。

结论:9.8比9.11大。"""

##BF16: 
"""比较两个数字 9.11 和 9.8,可以按照以下步骤进行:

1. **整数部分**:两个数字的整数部分都是 9,所以需要比较小数部分。
2. **小数部分**:
   - 9.11 的小数部分是 0.11
   - 9.8 的小数部分是 0.8

3. **比较小数部分**:
   - 0.11 和 0.8 比较时,0.8 明显大于 0.11。

因此,9.8 大于 9.11。"""


prompt = "Once upon a time,"
##INT4: 
"""Once upon a time, in a far-off land, there was a kingdom filled with wonder and magic. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush forests and rolling hills.

The people of the kingdom lived happily, tending to their farms, crafting beautiful goods, and enjoying the simple pleasures of life. However, one day, a great darkness began to spread across the land. A wicked sorcerer had risen from the shadows, seeking to claim the throne for himself and plunge the kingdom into chaos.

The king, knowing that he could not face this threat alone, called upon the bravest and most skilled heroes from all corners of the realm. Among them was a young knight named Sir Cedric, who had earned a reputation for his courage and unwavering sense of justice.

Sir Cedric, along with a group of loyal companions, set out on a perilous journey to stop the sor"""

##BF16:
"""Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush gardens and sparkling fountains.

The king had a young daughter named Princess Elara, who was as kind and gentle as she was beautiful. She spent her days helping the poor and spreading joy throughout the kingdom. The people adored her, and she was beloved by all.

One day, a great challenge arose. A dark forest on the outskirts of the kingdom began to grow wild and dangerous, threatening the safety of the villagers. The king called for a hero to tame the forest and protect his people. Many brave knights and warriors came forward, but none could succeed.

Princess Elara, determined to help, decided to venture into the forest herself. Her father was hesitant, but he saw the determination in her eyes and knew"""


prompt = "请简短介绍一下阿里巴巴公司"
##INT4:
"""阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。它成立于1999年,由马云和他的团队创立。阿里巴巴旗下拥有包括淘宝、天猫、阿里云等在内的多个知名业务平台,涵盖了在线零售、批发贸易、云计算、数字娱乐、金融服务等多个领域。

阿里巴巴的愿景是让世界各地的企业都能够平等地进行贸易,通过技术创新推动数字经济的发展,为社会创造更多的价值。目前,阿里巴巴已经发展成为世界领先的互联网公司之一,业务遍布全球多个国家和地区,服务着数以亿计的用户和商家。"""

##BF16:
"""阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。成立于1999年,阿里巴巴最初是一个B2B在线市场,旨在连接中国制造商与全球买家。经过二十多年的发展,阿里巴巴已经发展成为涵盖电子商务、金融、物流、云计算等多个领域的综合性企业集团。

阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名品牌,为消费者提供购物、支付、娱乐等多元化服务,同时也为企业提供营销、销售、物流和技术支持等全方位解决方案。此外,阿里巴巴还积极投资和孵化创新项目,推动数字经济的发展。

阿里巴巴始终秉持“让天下没有难做的生意”的使命,致力于通过技术创新促进全球经济的可持续发展。"""

Evaluate the model

pip3 install lm-eval==0.4.5

auto-round --model "OPEA/Qwen2.5-72B-Instruct-int4-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
Metric BF16 INT4
Avg 0.7413 0.7448
leaderboard_mmlu_pro 5 shots 0.5919 0.5864
leaderboard_ifeval inst_level_strict_acc 0.7770 0.7866
leaderboard_ifeval prompt_level_strict_acc 0.6858 0.6932
mmlu 0.8334 0.8308
cmmlu 0.8727 0.8673
ceval-valid 0.8975 0.8960
gsm8k 5 shots 0.9037 0.9098
lambada_openai 0.7518 0.7563
hellaswag 0.7031 0.7014
winogrande 0.7601 0.7687
piqa 0.8313 0.8232
truthfulqa_mc1 0.5239 0.5263
openbookqa 0.3860 0.3820
boolq 0.9049 0.9046
arc_easy 0.8632 0.8611
arc_challenge 0.6135 0.6237

Generate the model

Here is the sample command to generate the model.

auto-round \
--model  Qwen/Qwen2.5-72B-Instruct \
--device 0 \
--group_size 128 \
--nsamples 512 \
--bits 4 \
--iter 1000 \
--disable_eval \
--model_dtype "fp16" \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround" 

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
9
Safetensors
Model size
11.9B params
Tensor type
I32
·
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for OPEA/Qwen2.5-72B-Instruct-int4-sym-inc

Base model

Qwen/Qwen2.5-72B
Finetuned
(28)
this model

Dataset used to train OPEA/Qwen2.5-72B-Instruct-int4-sym-inc

Collection including OPEA/Qwen2.5-72B-Instruct-int4-sym-inc