File size: 686 Bytes
052ab03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e226218
d60ef35
052ab03
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
datasets:
- Anthropic/hh-rlhf
language:
- en
tags:
- rlhf
model-index:
  - name: deberta-v3-large-tasksource-rlhf-reward-model 
    results:
      - task:
          type: text-classification
          name: RLHF
        dataset:
          type: rlhf
          name: Anthropic/hh-rlhf
          split: validation
        metrics:
          - type: accuracy
            value: 0,7516
            verified: true
---
#  Reward model based `deberta-v3-large-tasksource-nli` fine-tuned on Anthropic/hh-rlhf
For 1 epoch with 1e-5 learning rate.

Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for `OpenAssistant/reward-model-deberta-v3-large-v2`).