Qwen
/

QwQ-32B

@@ -31,11 +31,11 @@ QwQ is the reasoning model of the Qwen series. Compared with conventional instru
 **Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying QwQ models.
-For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
 ## Requirements
-The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
 With `transformers<4.37.0`, you will encounter the following error:
 ```
@@ -89,9 +89,8 @@ To achieve optimal performance, we recommend the following settings:
 1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is already automatically implemented, but it may cause the response to lack the \<think\> tag at the beginning. This is normal behavior.
 2. **Sampling Parameters**:
-   - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions and enhance diversity.
-   - For complex reasoning tasks like math or coding, set TopK=40.
-   - For other types of questions, use TopK=20.
 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
    - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
@@ -117,7 +116,7 @@ We advise adding the `rope_scaling` configuration only when processing long cont
 ## Evaluation & Performance
-Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
 For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
@@ -126,12 +125,12 @@ For requirements on GPU memory and the respective throughput, see results [here]
 If you find our work helpful, feel free to give us a cite.
 ```
-@misc{qwen2.5,
     title = {Qwen2.5: A Party of Foundation Models},
-    url = {https://qwenlm.github.io/blog/qwen2.5/},
     author = {Qwen Team},
-    month = {September},
-    year = {2024}
 }
 @article{qwen2,

 **Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying QwQ models.
+For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwq-32b/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
 ## Requirements
+QwQ is based on Qwen2.5, whose code has been in the latest Hugging face `transformers`. We advise you to use the latest version of `transformers`.
 With `transformers<4.37.0`, you will encounter the following error:
 ```
 1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is already automatically implemented, but it may cause the response to lack the \<think\> tag at the beginning. This is normal behavior.
 2. **Sampling Parameters**:
+   - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
+   - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
    - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
 ## Evaluation & Performance
+Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwq-32b/).
 For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
 If you find our work helpful, feel free to give us a cite.
 ```
+@misc{qwq32b,
     title = {Qwen2.5: A Party of Foundation Models},
+    url = {https://qwenlm.github.io/blog/qwq-32b/},
     author = {Qwen Team},
+    month = {March},
+    year = {2025}
 }
 @article{qwen2,

generation_config.json CHANGED Viewed

@@ -8,7 +8,7 @@
   "pad_token_id": 151643,
   "repetition_penalty": 1.0,
   "temperature": 0.6,
-  "top_k": 20,
   "top_p": 0.95,
   "transformers_version": "4.45.2"
 }

   "pad_token_id": 151643,
   "repetition_penalty": 1.0,
   "temperature": 0.6,
+  "top_k": 40,
   "top_p": 0.95,
   "transformers_version": "4.45.2"
 }