Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints

What's the difference between llm-blender/PairRM and llm-blender/pair-ranker?

#3
by nefelibata-mu - opened

I'm wondering the difference between llm-blender/PairRM and llm-blender/pair-ranker. Hope to get an answer.

LLM Blender org

Their model architecture is the same. The main difference lies in the training data and context length.

  1. For the training data:
  • pair-ranker is the ranker trained on llm-blender/mix-instruct dataset, producing the results reported in the llm-blender paper.
  • PairRM is trained on openai/summarize_from_feedback, lmsys/chatbot_arena_conversations, etc. which does not contain mix-instruct data. (see in PairRM README)
  1. For the context length
    There is a simple table comparing the this. The main difference is that pair-ranker constrains the source and candidate lengths to be shorter than 128, while PairRM can extend that constraint to 1224 and 412 respectively.

Overall, you can consider PairRM a more powerful version of llm-blender/pair-ranker.

Thank you very much for your patient answer.

Sign up or log in to comment