The output of the reward model is a two-dimensional vector, what does each dimension mean?

#3
by Lily912 - opened

image.png

Sign up or log in to comment