LemiSt
/

PairRM-mdeberta-v3-base

Text Generation

Model card Files Files and versions Community

PairRM-mdeberta-v3-base / README.md

LemiSt's picture

added citation

5506b8d verified 2 months ago

|

1.45 kB

	---
	license: mit
	datasets:
	- openai/summarize_from_feedback
	- openai/webgpt_comparisons
	- Dahoas/synthetic-instruct-gptj-pairwise
	- Anthropic/hh-rlhf
	- lmsys/chatbot_arena_conversations
	- openbmb/UltraFeedback
	metrics:
	- accuracy
	tags:
	- reward_model
	- reward-model
	- RLHF
	- evaluation
	- llm
	- instruction
	- reranking
	language:
	- multilingual
	- en
	- ar
	- bg
	- de
	- el
	- es
	- fr
	- hi
	- ru
	- sw
	- th
	- tr
	- ur
	- vi
	- zh
	pipeline_tag: text-generation
	---

	# Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base

	This is an attempt to create a multilingual [PairRM](https://huggingface.co/llm-blender/PairRM)-Model by applying the training procedure from the original LLM-Blender repository to [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).

	I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples.

	For additional (usage) information information please refer to the [original](https://huggingface.co/llm-blender/PairRM) model.



	## Citation & Credits
	```bibtex
	@inproceedings{llm-blender-2023,
	title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion",
	author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen",
	booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)",
	year = "2023"
	}

	```