Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
yuchenlin commited on
Commit
e391c3f
1 Parent(s): d3b55cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ Apart from that, one can also use PairRM to further align instruction-tuned LLMs
40
 
41
  Unlike the other RMs that encode and score each candidate respectively,
42
  PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
43
- Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: 0.4B.
44
  We trained PairRM on a diverse collection of six human-preference datasets (see more [here](https://huggingface.co/llm-blender/PairRM#training-datasets)).
45
 
46
  PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.
 
40
 
41
  Unlike the other RMs that encode and score each candidate respectively,
42
  PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
43
+ Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: **0.4B**.
44
  We trained PairRM on a diverse collection of six human-preference datasets (see more [here](https://huggingface.co/llm-blender/PairRM#training-datasets)).
45
 
46
  PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.