example to service the RM
#2
by
weqweasdas
- opened
Great work!
I assume that this RM is a process RM? I am curious whether you could provide an example to service the RM to evaluate the intermediate step of the CoT. Many thanks!
As mentioned in the technical report (https://arxiv.org/pdf/2409.12122), we label the responses with the correct answers as positive, while those with incorrect answers as negative, so it is an outcome RM. If you want to try scoring the intermediate step, you might consider directly inputting a step response.
Zhenru
changed discussion status to
closed