zhilinw commited on
Commit
21aff5a
1 Parent(s): f6b885e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -38,7 +38,7 @@ As of 27 Sept 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on Rewar
38
 
39
  | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
40
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
41
- | _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
42
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
43
  | TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
44
  | Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
@@ -60,7 +60,7 @@ On the other hand, when GPT-4 annotations are used as Ground-Truth, Llama-3.1-Ne
60
  | Model | Type of Data Used For Training | Chat-Hard | LLMBar-Adversarial-Manual | LLMBar-Adversarial-Neighbour | LLMBar-Natural | LLMBar-Adversarial-GPTInst | LLMBar-Adversarial-GPTOut | MT-Bench-Hard|
61
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|:-----------------------|:-----------------------|
62
  |||| Human as Ground Truth | Human as Ground Truth | Human as Ground Truth | _GPT-4 as Ground Truth_ |_GPT-4 as Ground Truth_ | _GPT-4 as Ground Truth_ |
63
- | Llama-3.1-Nemotron-70B-Reward |Permissive Licensed Data Only (CC-BY-4.0) | 85.8 | 76.1 | 88.8 | 95.0 | 87.0 | 72.3 | 75.7
64
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
65
 
66
 
 
38
 
39
  | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
40
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
41
+ | _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.7 | **95.1** | **98.1** |
42
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
43
  | TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
44
  | Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
 
60
  | Model | Type of Data Used For Training | Chat-Hard | LLMBar-Adversarial-Manual | LLMBar-Adversarial-Neighbour | LLMBar-Natural | LLMBar-Adversarial-GPTInst | LLMBar-Adversarial-GPTOut | MT-Bench-Hard|
61
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|:-----------------------|:-----------------------|
62
  |||| Human as Ground Truth | Human as Ground Truth | Human as Ground Truth | _GPT-4 as Ground Truth_ |_GPT-4 as Ground Truth_ | _GPT-4 as Ground Truth_ |
63
+ | Llama-3.1-Nemotron-70B-Reward |Permissive Licensed Data Only (CC-BY-4.0) | 85.7 | 76.1 | 88.8 | 95.0 | 87.0 | 72.3 | 75.7
64
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
65
 
66