Text Classification
Transformers
PyTorch
English
deberta-v2
reward-model
reward_model
RLHF
Inference Endpoints

How to optimize loss function?

#1
by nidong - opened

According to the InstructGPT paper, the current loss function is pairwise loss, but I found that the gap between the output scores cannot be widened. Is there any direction to solve this problem?

OpenAssistant org

"the output scores" you are referring to, is it this model or something you are currently facing? Cause InstructGPT did have a mean adjusting step where they make sure the average rank scores in their datasets have a zero mean

Sign up or log in to comment