Pairwise Reward Model for LLMs (PairRM) based on mdeberta-v3-base
This is an attempt to create a multilingual PairRM-Model by applying the training procedure from the original LLM-Blender repository to mdeberta-v3-base.
I have not yet done any real testing apart from some sanity checks with the provided samples from the original PairRM-Model as well as some quick made-up samples.
For additional (usage) information information please refer to the original model.
Citation & Credits
@inproceedings{llm-blender-2023,
title = "LLM-Blender: Ensembling Large Language Models with Pairwise Comparison and Generative Fusion",
author = "Jiang, Dongfu and Ren, Xiang and Lin, Bill Yuchen",
booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (ACL 2023)",
year = "2023"
}
- Downloads last month
- 8