loubnabnl HF staff commited on
Commit
63a2bcd
1 Parent(s): a172b84

Update evaluation/demo_humaneval.md

Browse files
Files changed (1) hide show
  1. evaluation/demo_humaneval.md +1 -1
evaluation/demo_humaneval.md CHANGED
@@ -40,7 +40,7 @@ Instead of 200 candidate solutions, we will only generate 20 samples for illustr
40
 
41
  **Remark**:
42
 
43
- Regarding the temperature parameter, in [CodeGen](https://github.com/salesforce/CodeGen) paper, the authors observed that the best performing temperature increases as the number of samples permitted k increases. When a model is only allowed a few samples to pass unit tests, it is beneficial to use the learned distribution, through a low temperature, to select candidates that are likely to pass. But when a model is allowed for more chances with a high k, using a higher sampling temperature to tilt the learned model distribution lets it explore diverse samples and thus have a greater chance of synthesizing a correct program.
44
 
45
 
46
  For our experiment, we compute pass@1, pass@10 and pass@20, each corresponding to unit test pass rate when selecting respectively 1, 10 and 20 samples from the candidate solutions.
40
 
41
  **Remark**:
42
 
43
+ Regarding the temperature parameter, in [Codex](https://arxiv.org/pdf/2107.03374.pdf) paper, the authors observed that the best performing temperature increases as the number of samples permitted k increases. Similar behavior was also observed in [CodeGen](https://arxiv.org/pdf/2203.13474.pdf). When a model is only allowed a few samples to pass unit tests, it is beneficial to use the learned distribution, through a low temperature, to select candidates that are likely to pass. But when a model is allowed for more chances with a high k, using a higher sampling temperature to tilt the learned model distribution lets it explore diverse samples and thus have a greater chance of synthesizing a correct program.
44
 
45
 
46
  For our experiment, we compute pass@1, pass@10 and pass@20, each corresponding to unit test pass rate when selecting respectively 1, 10 and 20 samples from the candidate solutions.