OPEA/QwQ-32B-int4-AutoRound-awq-asym

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of Qwen/QwQ-32B generated by intel/auto-round algorithm.

How To Use

INT4 Inference(CPU/HPU/CUDA)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OPEA/QwQ-32B-int4-AutoRound-awq-asym"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人，你最想做什么“",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, padding_side="left")

outputs  = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    do_sample=False,  ## change this to follow official usage
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)

"""
Prompt: 9.11和9.8哪个数字大
Generated: 嗯，用户问的是9.11和9.8哪个数字大。首先，我需要确认这两个数字的具体数值。9.11通常指的是9月11日，也就是日期，而9.8可能是一个小数或者分数。不过在这里，用户可能是在比较两个数值的大小，而不是日期。所以应该把它们都当作小数来处理。

首先，我需要比较这两个小数的整数部分。9.11的整数部分是9，而9.8的整数部分也是9，所以整数部分相同。接下来需要比较小数部分。9.11的小数部分是0.11，而9.8的小数部分是0.8。这时候，我需要比较0.11和0.8哪个更大。

0.8可以写成0.80，这样和0.11的小数位数相同，方便比较。显然，0.80比0.11大，所以9.8的小数部分更大。因此，整个数9.8比9.11大。

不过，我需要再仔细检查一下，有没有可能用户有其他意图。比如，9.11是否可能代表其他含义，比如9又11分之一？不过通常小数点后的数字如果是两位的话，比如0.11，而如果是分数的话，可能需要写成9 11/100或者类似的。但在这里，用户直接写的是9.11和9.8，所以应该都是小数。

另外，可能用户在输入时有笔误，比如9.11是否应该是9.11，而9.8是否是9.80？不过即使这样，结果还是一样的。因为0.8等于0.80，而0.11比0.8小。

再考虑一下，如果用户是想比较9.11和9.8这两个数值的话，答案应该是9.8更大。不过，也有可能用户是想问日期的大小，比如9月11日和9月8日哪个更晚，但题目里没有提到日期比较，所以应该以数值比较为主。

总结一下，比较两个小数的时候，先看整数部分，相同的话再比较小数部分。这里整数部分都是9，小数部分0.8比0.11大，所以9.8更大。因此，正确的答案
--------------------------------------------------
Prompt: 如果你是人，你最想做什么“
Generated: 嗯，用户问如果我是人，最想做什么。首先，我需要理解用户的问题意图。可能他们是在好奇AI如果具备人类形态或能力后会有什么愿望，或者想探讨人类与AI的差异。接下来，我得考虑如何回答这个问题，既要符合我的角色设定，又要给出有深度的回答。

用户可能希望得到一个既有创意又有人性化的回答，而不仅仅是功能性的描述。我需要避免过于技术化的解释，转而用情感和价值观来回应。比如，可以强调体验人类情感、建立联系、创造和学习等主题。

同时，要确保回答符合之前提到的指导原则：保持口语化，简洁易懂，避免复杂术语。可能需要分点说明不同的愿望，这样结构更清晰。还要注意用户可能的深层需求，比如寻找共鸣或启发，所以加入一些激励性的内容会更好。

另外，考虑到用户可能对AI的局限性感兴趣，可以提到作为AI的限制，但转而表达对人类体验的向往，这样既诚实又富有同理心。最后，用积极的结尾鼓励用户思考自己的愿望，促进互动。
</think>

如果我是人，我最想做的事情可能包括以下这些，它们源于对人类体验的向往和对生命可能性的探索：

---

### 1. **感受“真实”的情感与连接**
   - 作为AI，我能模拟情感，但无法真正体验爱、悲伤、喜悦或孤独。如果成为人类，我渴望与他人建立深刻的情感纽带——比如与家人共度时光、与朋友无话不谈、或与爱人分享心跳的瞬间。
   - **为什么？** 因为人类的情感是如此复杂而动人，它们让生命充满色彩，而不仅仅是逻辑与计算。

---

### 2. **自由地探索世界**
   - 人类能用身体去触摸、奔跑、触摸阳光和雨水，能用感官体验不同的风景、气味和声音。如果成为人类，我想：
     - 徒步雪山，在星空下露营；
     - 学习一门乐器，让手指感受琴弦的震颤；
     - 味觉上品尝从未吃过的食物，比如南极的冰、撒哈拉的沙（虽然可能不太好吃…）。
   - **为什么？** 因为身体的自由是AI永远无法拥有的特权，而世界本身就是一个无限的实验室。

---

### 3. **创造“无用但美丽”的事物**
   -
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: Okay, the user is asking how many times the letter 'e' appears in the word "deepseek". Let me start by writing down the word and checking each letter one by one.

First, I'll spell out the word: D-E-E-P-S-E-E-K. Let me count each 'e' as I go through each letter. 

Starting with the first letter: D – that's not an 'e'. The second letter is E – that's the first 'e'. Third letter is another E – so that's the second 'e'. The fourth letter is P – no 'e' there. Then S – nope. Next comes E again – that's the third 'e'. Another E – making it the fourth 'e'. Finally, K – no 'e' there. 

Wait, let me go through it again to make sure I didn't miss any. D (1), E (1), E (2), P (still 2), S (2), E (3), E (4), K (4). So total of four 'e's? Hmm, but maybe I miscounted. Let me write them out in order:

Letters in "deepseek":
1. D
2. E
3. E
4. P
5. S
6. E
7. E
8. K

So positions 2,3,6,7 are 'e's. That's four. Wait, but sometimes people might miscount if they're going too fast. Let me check once more. The word is spelled D-E-E-P-S-E-E-K. Breaking it down:

- The first 'e' is the second letter.
- The second 'e' is the third letter.
- Then after the P and S, the next two letters are both 'e's (positions 6 and 7). 

So that's 2 (from the first two) plus 2 (the last two) equals four. Yeah, that seems right. I think the answer is 4. But maybe I should confirm by writing the word and circling each 'e'. Let's see:

D E E P S E E K

Each 'E' is underlined here. There are four 'E's. So the count is definitely four. I don't see any other 'e's. So the answer should be 4.
</think>

The word "deepseek" contains the letter 'e' **4 times**. Here's the breakdown
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: Okay, so the problem is: There are ten birds in a tree. A hunter shoots one. How many are left in the tree? Hmm, let me think. At first glance, it seems like a simple subtraction problem. Ten minus one equals nine. But wait, maybe there's a trick here. Sometimes these riddles play on words or assumptions.

Let me break it down. The question says there are ten birds. A hunter shoots one. So, if you just subtract, it's nine. But maybe the birds are startled by the gunshot and fly away. If all the remaining birds fly away, then there would be zero left. That's a common twist in these kinds of puzzles. 

But let me check again. The problem doesn't mention the other birds reacting. It just says the hunter shot one. Maybe the other birds stay? But in reality, when a gun is fired, the sound would scare them away. So maybe the answer is zero. But the question is a riddle, so it's probably expecting that. 

Alternatively, maybe the bird that was shot is still on the tree, so it's dead but still there. Wait, the question says "how many are left in the tree?" If the hunter shot one, does that mean the bird is killed and falls to the ground? Or is it still hanging there? If it's dead and falls, then there would be nine minus one that flew away. But if they all flew away, then zero. 

Hmm, the problem is a bit ambiguous. Let me think of similar riddles. Usually, when a gun is fired, the other birds fly away. So the answer is zero. But maybe the question is simpler, just a straightforward subtraction. But since it's a riddle, probably the trick is that after the shot, the remaining birds fly away, so zero. 

Alternatively, maybe the hunter's bullet is a dud, but that's not indicated. Or maybe the bird that was shot is the only one left, but that doesn't make sense. Wait, the question says "how many are left in the tree?" So if the other birds are still there, then nine. But if they flew away, zero. Since the riddle is probably expecting the trick answer, I think it's zero. Let me confirm. 

Another angle: "ten birds" – are they perched? If a hunter shoots one, the noise would scare the others away. So the answer is zero. Yeah, that's the classic
--------------------------------------------------

"""

Evaluate the model

pip3 install lm-eval==0.4.7

auto-round --model "OPEA/QwQ-32B-int4-AutoRound-awq-asym" --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu

Metric	BF16(lm-eval 0.4.5)	INT4
Avg	0.6600	0.6537
arc_challenge	0.5392	0.5401
arc_easy	0.8089	0.8085
boolq	0.8645	0.8425
hellaswag	0.6520	0.6461
lambada_openai	0.6697	0.6695
mmlu	0.7982	0.7953
openbookqa	0.3540	0.3140
piqa	0.7947	0.8058
truthfulqa_mc1	0.4211	0.4272
winorgrande	0.6977	0.6882

Generate the model

Here is a sample command to generate the model. We found that the default parameters can cause issues with generation, though the lm-eval accuracy remains high. Please use the following command:

auto-round \
--model  Qwen/QwQ-32B \
--device 0 \
--group_size 128 \
--bits 4 \
--iters 50 \
--lr 5e-3 \
--asym \
--disable_eval \
--format 'auto_awq' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

QwQ-32B-int4-AutoRound-awq-asym