Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ As of 27 Sept 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on Rewar
|
|
38 |
|
39 |
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
40 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
41 |
-
| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.
|
42 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
|
43 |
| TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
|
44 |
| Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
|
@@ -60,7 +60,7 @@ On the other hand, when GPT-4 annotations are used as Ground-Truth, Llama-3.1-Ne
|
|
60 |
| Model | Type of Data Used For Training | Chat-Hard | LLMBar-Adversarial-Manual | LLMBar-Adversarial-Neighbour | LLMBar-Natural | LLMBar-Adversarial-GPTInst | LLMBar-Adversarial-GPTOut | MT-Bench-Hard|
|
61 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|:-----------------------|:-----------------------|
|
62 |
|||| Human as Ground Truth | Human as Ground Truth | Human as Ground Truth | _GPT-4 as Ground Truth_ |_GPT-4 as Ground Truth_ | _GPT-4 as Ground Truth_ |
|
63 |
-
| Llama-3.1-Nemotron-70B-Reward |Permissive Licensed Data Only (CC-BY-4.0) | 85.
|
64 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
|
65 |
|
66 |
|
|
|
38 |
|
39 |
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
40 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
41 |
+
| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.7 | **95.1** | **98.1** |
|
42 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
|
43 |
| TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
|
44 |
| Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
|
|
|
60 |
| Model | Type of Data Used For Training | Chat-Hard | LLMBar-Adversarial-Manual | LLMBar-Adversarial-Neighbour | LLMBar-Natural | LLMBar-Adversarial-GPTInst | LLMBar-Adversarial-GPTOut | MT-Bench-Hard|
|
61 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|:-----------------------|:-----------------------|
|
62 |
|||| Human as Ground Truth | Human as Ground Truth | Human as Ground Truth | _GPT-4 as Ground Truth_ |_GPT-4 as Ground Truth_ | _GPT-4 as Ground Truth_ |
|
63 |
+
| Llama-3.1-Nemotron-70B-Reward |Permissive Licensed Data Only (CC-BY-4.0) | 85.7 | 76.1 | 88.8 | 95.0 | 87.0 | 72.3 | 75.7
|
64 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
|
65 |
|
66 |
|