openbmb
/

Eurus-RM-7b

Text Classification

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Multi-turn input template

#6

by linzeqipku - opened May 21, 2024

May 21, 2024

•

edited May 21, 2024

Hi team, thank you for the great work.
I'm wondering how to use this model for calculating the rewards of multi-turn messages (i.e., what is the chat_template to convert messages to input string).
For example,

messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi"},
    {"role": "user", "content": "How are you?"},
    {"role": "assistant", "content": "I'm fine, thank you. And you?"}
]

I'm also wondering how the rewards of multi-turn messages are modeled during training. For example, it models the quality of the last assistant message or all assistant message?

lievan

OpenBMB org Jun 8, 2024

•

edited Jun 8, 2024

Hi,

Thanks for your kind words.

For all kinds of messages, we directly format them using tokenizer.apply_chat_template() function.

In the RM training, the attn masks of contexts (messages[:-1] in your example) were set to 0 and only the masks of the last assistant message (message[-1]) were set to 1. That said, it was trained to model the last assistant message given single/multiple turns of contexts.

Hope this helps :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment