yizheapple commited on
Commit
b0a0ae1
·
verified ·
1 Parent(s): fc987ff

Update README.md with research/reproducibility notes

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -18,6 +18,11 @@ This model was produced using **Simple Self-Distillation (SSD)**, a method that
18
  - **Self-distillation sampling:** temperature=1.1, top_p=0.95, top_k=20
19
  - **Evaluation sampling:** temperature=0.7, top_p=0.95, top_k=20
20
 
 
 
 
 
 
21
  ## Method
22
 
23
  SSD samples solutions from the base model using non-unit temperature and top-k/top-p truncation, then fine-tunes on those samples via standard supervised learning. Despite its simplicity, SSD yields large gains on competitive programming benchmarks, with improvements concentrating on harder problems. The mechanism traces to resolving a *precision–exploration conflict*: SSD reshapes token distributions in a context-dependent way so that a single global decoding configuration becomes far more effective at evaluation time.
 
18
  - **Self-distillation sampling:** temperature=1.1, top_p=0.95, top_k=20
19
  - **Evaluation sampling:** temperature=0.7, top_p=0.95, top_k=20
20
 
21
+ ## Notes
22
+ - These are research checkpoints for reproducibility.
23
+ - They are not optimized Qwen releases.
24
+ - They don't represent a broader open-source model strategy.
25
+
26
  ## Method
27
 
28
  SSD samples solutions from the base model using non-unit temperature and top-k/top-p truncation, then fine-tunes on those samples via standard supervised learning. Despite its simplicity, SSD yields large gains on competitive programming benchmarks, with improvements concentrating on harder problems. The mechanism traces to resolving a *precision–exploration conflict*: SSD reshapes token distributions in a context-dependent way so that a single global decoding configuration becomes far more effective at evaluation time.