OpenAssistant/reward-model-deberta-v3-large-v2 · Question about evaluating this reward model on Anthropic/hh-rlhf

Hello!

It can be seen in this repo that you evaluate this model [OpenAssistant/reward-model-deberta-v3-large-v2] towards the dataset [Anthropic/hh-rlhf], which contains pairs of good/bad multi-turn conversations.

However, it seems unclear that how to input each multi-turn conversation text into this model, for the model requires two text parts and then compute a score (according to README).

I also refer to [https://github.com/LAION-AI/Open-Assistant/blob/9cb6de07b5014a6f4d6bbaab9e9e1cc8b990bc43/model/model_training/trainer_rm.py] for advice. The released code seems to first parse each conversation, then replace the role(Human: / Assistant:) with special tokens(<|prompter|> / <|assistant|>) and finally concat all turns with the EOS token. However, these two special tokens (<|prompter|> / <|assistant|>) are not in the vocabulary of the corresponding tokenizer.

Hence, I am wondering that can you kindly show the format of the input text for a sample of the dataset [Anthropic/hh-rlhf], or provide a demo, which are of great value for me to learn.

Thank you.

Hello!

It can be seen in this repo that you evaluate this model [OpenAssistant/reward-model-deberta-v3-large-v2] towards the dataset [Anthropic/hh-rlhf], which contains pairs of good/bad multi-turn conversations.

However, it seems unclear that how to input each multi-turn conversation text into this model, for the model requires two text parts and then compute a score (according to README).

I also refer to [https://github.com/LAION-AI/Open-Assistant/blob/9cb6de07b5014a6f4d6bbaab9e9e1cc8b990bc43/model/model_training/trainer_rm.py] for advice. The released code seems to first parse each conversation, then replace the role(Human: / Assistant:) with special tokens(<|prompter|> / <|assistant|>) and finally concat all turns with the EOS token. However, these two special tokens (<|prompter|> / <|assistant|>) are not in the vocabulary of the corresponding tokenizer.

Hence, I am wondering that can you kindly show the format of the input text for a sample of the dataset [Anthropic/hh-rlhf], or provide a demo, which are of great value for me to learn.

Thank you.

Do you have any ideas on how to reproduce the acc results? I am also meeting this same problems.