example to service the RM

#2
by weqweasdas - opened

Great work!

I assume that this RM is a process RM? I am curious whether you could provide an example to service the RM to evaluate the intermediate step of the CoT. Many thanks!

Qwen org
β€’
edited Sep 23

As mentioned in the technical report (https://arxiv.org/pdf/2409.12122), we label the responses with the correct answers as positive, while those with incorrect answers as negative, so it is an outcome RM. If you want to try scoring the intermediate step, you might consider directly inputting a step response.

Zhenru changed discussion status to closed

Sign up or log in to comment