--- datasets: - Anthropic/hh-rlhf language: - en tags: - rlhf model-index: - name: deberta-v3-large-tasksource-rlhf-reward-model results: - task: type: text-classification name: RLHF dataset: type: rlhf name: Anthropic/hh-rlhf split: validation metrics: - type: accuracy value: 0,7516 verified: true --- # Reward model based [`deberta-v3-large-tasksource-nli`](https://huggingface.co/sileod/deberta-v3-large-tasksource-nli) fine-tuned on Anthropic/hh-rlhf For 1 epoch with 1e-5 learning rate. The data are described in the paper: [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/abs/2204.05862). Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for `OpenAssistant/reward-model-deberta-v3-large-v2`).