feihu.hf commited on
Commit
2ebdfe8
·
1 Parent(s): 3ffd731

update readme

Browse files
Files changed (2) hide show
  1. README.md +9 -10
  2. generation_config.json +1 -1
README.md CHANGED
@@ -31,11 +31,11 @@ QwQ is the reasoning model of the Qwen series. Compared with conventional instru
31
 
32
  **Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying QwQ models.
33
 
34
- For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
35
 
36
  ## Requirements
37
 
38
- The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
39
 
40
  With `transformers<4.37.0`, you will encounter the following error:
41
  ```
@@ -89,9 +89,8 @@ To achieve optimal performance, we recommend the following settings:
89
  1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is already automatically implemented, but it may cause the response to lack the \<think\> tag at the beginning. This is normal behavior.
90
 
91
  2. **Sampling Parameters**:
92
- - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions and enhance diversity.
93
- - For complex reasoning tasks like math or coding, set TopK=40.
94
- - For other types of questions, use TopK=20.
95
 
96
  3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
97
  - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
@@ -117,7 +116,7 @@ We advise adding the `rope_scaling` configuration only when processing long cont
117
 
118
  ## Evaluation & Performance
119
 
120
- Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
121
 
122
  For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
123
 
@@ -126,12 +125,12 @@ For requirements on GPU memory and the respective throughput, see results [here]
126
  If you find our work helpful, feel free to give us a cite.
127
 
128
  ```
129
- @misc{qwen2.5,
130
  title = {Qwen2.5: A Party of Foundation Models},
131
- url = {https://qwenlm.github.io/blog/qwen2.5/},
132
  author = {Qwen Team},
133
- month = {September},
134
- year = {2024}
135
  }
136
 
137
  @article{qwen2,
 
31
 
32
  **Note:** For the best experience, please review the [usage guidelines](#usage-guidelines) before deploying QwQ models.
33
 
34
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwq-32b/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
35
 
36
  ## Requirements
37
 
38
+ QwQ is based on Qwen2.5, whose code has been in the latest Hugging face `transformers`. We advise you to use the latest version of `transformers`.
39
 
40
  With `transformers<4.37.0`, you will encounter the following error:
41
  ```
 
89
  1. **Enforce Thoughtful Output**: Ensure the model starts with "\<think\>\n" to prevent generating empty thinking content, which can degrade output quality. If you use `apply_chat_template` and set `add_generation_prompt=True`, this is already automatically implemented, but it may cause the response to lack the \<think\> tag at the beginning. This is normal behavior.
90
 
91
  2. **Sampling Parameters**:
92
+ - Use Temperature=0.6 and TopP=0.95 instead of Greedy decoding to avoid endless repetitions.
93
+ - Use TopK between 20 and 40 to filter out rare token occurrences while maintaining the diversity of the generated output.
 
94
 
95
  3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
96
  - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
 
116
 
117
  ## Evaluation & Performance
118
 
119
+ Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwq-32b/).
120
 
121
  For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
122
 
 
125
  If you find our work helpful, feel free to give us a cite.
126
 
127
  ```
128
+ @misc{qwq32b,
129
  title = {Qwen2.5: A Party of Foundation Models},
130
+ url = {https://qwenlm.github.io/blog/qwq-32b/},
131
  author = {Qwen Team},
132
+ month = {March},
133
+ year = {2025}
134
  }
135
 
136
  @article{qwen2,
generation_config.json CHANGED
@@ -8,7 +8,7 @@
8
  "pad_token_id": 151643,
9
  "repetition_penalty": 1.0,
10
  "temperature": 0.6,
11
- "top_k": 20,
12
  "top_p": 0.95,
13
  "transformers_version": "4.45.2"
14
  }
 
8
  "pad_token_id": 151643,
9
  "repetition_penalty": 1.0,
10
  "temperature": 0.6,
11
+ "top_k": 40,
12
  "top_p": 0.95,
13
  "transformers_version": "4.45.2"
14
  }