llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yuchenlin commited on Nov 23, 2023

Commit

e391c3f

•

1 Parent(s): d3b55cf

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ Apart from that, one can also use PairRM to further align instruction-tuned LLMs
 Unlike the other RMs that encode and score each candidate respectively,
 PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
-Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: 0.4B.
 We trained PairRM on a diverse collection of six human-preference datasets (see more [here](https://huggingface.co/llm-blender/PairRM#training-datasets)).
 PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.

 Unlike the other RMs that encode and score each candidate respectively,
 PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
+Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: **0.4B**.
 We trained PairRM on a diverse collection of six human-preference datasets (see more [here](https://huggingface.co/llm-blender/PairRM#training-datasets)).
 PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.