Model Details

This model is an int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B generated by intel/auto-round algorithm.

Please follow the license of the original model.

How To Use

INT4 Inference on CUDA

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

quantized_model_dir = "OPEA/DeepSeek-R1-Distill-Qwen-32B-int4-gptq-sym-inc"

device_map="auto"
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map=device_map,
)

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人,你最想做什么",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)


"""
Prompt: 9.11和9.8哪个数字大
Generated: .11和.8哪个数字大
</think>

.11和.8哪个数字大
</think>

要比较 **9.11** 和 **9.8** 哪个更大,可以按照以下步骤进行:

1. **比较整数部分**:
   - 两个数字的整数部分都是 **9**,所以整数部分相等。

2. **比较小数部分**:
   - **9.11** 的小数部分是 **0.11**
   - **9.8** 的小数部分是 **0.8**(即 **0.80**)

   由于 **0.80 > 0.11**,所以 **9.8** 的小数部分更大。

3. **结论**:
   - 因此,**9.8** 比 **9.11** 大。

最终答案:\boxed{9.8}
--------------------------------------------------
Prompt: 如果你是人类,你最想做什么
Generated: 您好!我是由中国的深度求索(DeepSeek)公司开发的智能助手DeepSeek-R1。有关模型和产品的详细内容请参考官方文档。
</think>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
</think>

您好!我是由中国的深度求索
</think>

您好!我是由中国的深度求索(DeepSeek)公司开发的智能助手DeepSeek-R1。有关模型和产品的详细内容请参考官方文档。
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: To determine how many times the letter 'e' appears in the word "deepseek," I will examine each letter one by one.

First, I'll list out the letters in the word: D, E, E, P, S, E, E, K.

Next, I'll go through each letter and count every occurrence of the letter 'e'.

Starting with the first letter, D, it's not an 'e'. The second letter is E, which counts as one. The third letter is another E, making it two. The fourth letter is P, not an 'e'. The f
ifth letter is S, also not an 'e'. The sixth letter is E, bringing the count to three. The seventh letter is another E, making it four. The last letter is K, which isn't an 'e'.

After reviewing all the letters, I find that the letter 'e' appears four times in the word "deepseek."
</think>

To determine how many times the letter **e** appears in the word **deepseek**, follow these steps:

1. **Write down the word:**

   **d e e p s e e k**

2. **Identify and count each 'e':**

   - **e** (position 2)
   - **e** (position 3)
   - **e** (position 6)
   - **e** (position 7)

3. **Total count of 'e':**

   There are **4** occurrences of the letter **e** in the word **deepseek**.

\[
\boxed{4}
\]
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: \n</think>

If a hunter shoots one bird from a tree that initially has ten birds, the number of birds remaining in the tree would depend on the reaction of the other birds.\n\n1. **Immediate React
ion**: When a hunter shoots one bird, the loud noise and disturbance might scare the remaining birds, causing them to fly away. In this case, all the other nine birds would likely leav
e the tree.\n\n2. **No Reaction**: If the other birds are not disturbed or choose to stay despite the shot, there would still be nine birds left in the tree.\n\nHowever, in most scenar
ios, the loud noise of a gunshot would scare the birds, leading to all of them flying away..

Evaluate the model

pip3 install lm-eval==0.4.7

lm-eval --model hf --model_args pretrained=OPEA/DeepSeek-R1-Distill-Qwen-32B-int4-gptq-sym-inc   --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k --batch_size 16
Metric BF16 INT4
avg 0.6647 0.6639
leaderboard_mmlu_pro - -
mmlu 0.7964 0.7928
lambada_openai 0.6649 0.6718
hellaswag 0.6292 0.6223
winogrande 0.7482 0.7482
piqa 0.8058 0.7982
truthfulqa_mc1 0.3831 0.3905
openbookqa 0.3520 0.3520
boolq 0.8963 0.8972
arc_easy 0.8207 0.8194
arc_challenge 0.5503 0.5469
leaderboard_ifeval - -
gsm8k - -

Generate the model

Here is the sample command to generate the model.

auto-round  \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--device 0 \
--bits 4 \
--iter 200 \
--disable_eval \
--format 'auto_gptq,auto_round,auto_awq' \
--output_dir "./tmp_autoround" 

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

  • Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Downloads last month
45
Safetensors
Model size
5.74B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for OPEA/DeepSeek-R1-Distill-Qwen-32B-int4-gptq-sym-inc

Quantized
(120)
this model

Dataset used to train OPEA/DeepSeek-R1-Distill-Qwen-32B-int4-gptq-sym-inc

Collection including OPEA/DeepSeek-R1-Distill-Qwen-32B-int4-gptq-sym-inc