Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
yuchenlin commited on
Commit
02215a3
1 Parent(s): e391c3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -167,9 +167,11 @@ Because they wanted someone who could communicate complex ideas without making a
167
  ```
168
 
169
  ### Use case 3: RLHF
170
- PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences with an extremly small model size (0.4B), approching the performance of GPT-4.
171
- We believe PairRM will power the alignment of LLM in an efficient and effective way.
172
- With a `blender.compare()` function, you can easily apply PairRM to poopular RLHF toolkits like [trl](https://huggingface.co/docs/trl/index).
 
 
173
 
174
  **🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
175
 
@@ -184,7 +186,7 @@ Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LL
184
  ### Context length
185
  | PairRanker type | Source max length | Candidate max length | Total max length |
186
  |:-----------------:|:-----------------:|----------------------|------------------|
187
- | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) | 128 | 128 | 384 |
188
  | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
189
 
190
 
 
167
  ```
168
 
169
  ### Use case 3: RLHF
170
+ PairRM has been trained on various high-quality and large-scale datasets with human preference annotations
171
+ and shown great correlation with human preferences with an extremely small model size (0.4B),
172
+ approching the performance of GPT-4.
173
+ PairRM will better help the future alignment of LLMs in a more efficient and effective way.
174
+ With a `blender.compare()` function, you can apply PairRM to popular RLHF toolkits such as [trl](https://huggingface.co/docs/trl/index).
175
 
176
  **🔥 Check more details on our example jupyter notebook usage: [`blender_usage.ipynb`](https://github.com/yuchenlin/LLM-Blender/blob/main/blender_usage.ipynb)**
177
 
 
186
  ### Context length
187
  | PairRanker type | Source max length | Candidate max length | Total max length |
188
  |:-----------------:|:-----------------:|----------------------|------------------|
189
+ | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) (our previous version) | 128 | 128 | 384 |
190
  | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
191
 
192