Dongfu Jiang
commited on
Commit
•
8f55e3b
1
Parent(s):
9777535
Update README.md
Browse files
README.md
CHANGED
@@ -193,6 +193,13 @@ Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LL
|
|
193 |
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) (our previous version) | 128 | 128 | 384 |
|
194 |
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
195 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
### Performance
|
198 |
PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
|
@@ -203,15 +210,7 @@ We test the pairwise comparison on
|
|
203 |
- [HHH-alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment)
|
204 |
- [MT-bench-human-judgements](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)
|
205 |
|
206 |
-
|
207 |
-
### Training Datasets
|
208 |
-
- [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
|
209 |
-
- [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
|
210 |
-
- [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
|
211 |
-
- [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
212 |
-
- [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
|
213 |
-
- [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)
|
214 |
-
|
215 |
|
216 |
#### Auto-J Pairwise test data performance
|
217 |
|
|
|
193 |
| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) (our previous version) | 128 | 128 | 384 |
|
194 |
| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
|
195 |
|
196 |
+
### Training Datasets
|
197 |
+
- [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
|
198 |
+
- [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
|
199 |
+
- [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
|
200 |
+
- [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
201 |
+
- [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
|
202 |
+
- [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)
|
203 |
|
204 |
### Performance
|
205 |
PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
|
|
|
210 |
- [HHH-alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment)
|
211 |
- [MT-bench-human-judgements](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)
|
212 |
|
213 |
+
All following results are reported as pairwise comparison accuracies (agreements).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
214 |
|
215 |
#### Auto-J Pairwise test data performance
|
216 |
|