|
--- |
|
license: mit |
|
datasets: |
|
- openai/summarize_from_feedback |
|
- openai/webgpt_comparisons |
|
- Dahoas/synthetic-instruct-gptj-pairwise |
|
- Anthropic/hh-rlhf |
|
- lmsys/chatbot_arena_conversations |
|
- openbmb/UltraFeedback |
|
metrics: |
|
- accuracy |
|
tags: |
|
- reward_model |
|
- reward-model |
|
- RLHF |
|
- evaluation |
|
- llm |
|
- instruction |
|
- reranking |
|
language: |
|
- multilingual |
|
- en |
|
- ar |
|
- bg |
|
- de |
|
- el |
|
- es |
|
- fr |
|
- hi |
|
- ru |
|
- sw |
|
- th |
|
- tr |
|
- ur |
|
- vi |
|
- zh |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base |
|
|
|
This is an attempt to create a multilingual [PairRM](https://huggingface.co/llm-blender/PairRM)-Model by applying the training procedure from the original LLM-Blender repository to [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base). |
|
|
|
I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples. |
|
|
|
For additional (usage) information information please refer to the [original](https://huggingface.co/llm-blender/PairRM) model. |
|
|
|
|
|
|
|
## Citation & Credits |
|
```bibtex |
|
@inproceedings{llm-blender-2023, |
|
title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion", |
|
author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen", |
|
booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)", |
|
year = "2023" |
|
} |
|
|
|
``` |
|
|