Update README.md
Browse files
README.md
CHANGED
@@ -117,10 +117,8 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
117 |
|
118 |
We use objective questions to assess eight dimensions: **Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency**. The metric design can be find in our [paper](https://arxiv.org/abs/2408.10903). The evaluation code can be found in [GitHub](https://github.com/yuyouyu32/BeyondDialogue/tree/main/AutoRPEval). The results are shown below:
|
119 |
|
120 |
-
<div style="overflow-x: auto;">
|
121 |
-
|
122 |
| **Model** | **Character ↑** | **Style ↑** | **Emotion ↓** | **Relationship ↓** | **Personality ↑** | **Avg. ↑** | **Human-likeness ↑** | **Role Choice ↑** | **Coherence ↑** |
|
123 |
-
|
124 |
| **General Baselines(Proprietary)** |
|
125 |
| GPT-4o | 74.32 ± 1.15 | **81.67 ± 1.51** | 16.31 ± 0.48 | **12.13 ± 0.66** | 66.58 ± 4.41 | 78.83 ± 1.64 | **67.33 ± 3.95** | **87.33 ± 3.86** | **99.67 ± 0.33**|
|
126 |
| GPT-3.5-Turbo | 72.26 ± 1.27 | 73.66 ± 1.73 | 17.79 ± 0.56 | 14.17 ± 0.73 | 66.92 ± 4.85 | 76.18 ± 1.83 | 33.33 ± 4.43 | 83.00 ± 4.68 | 97.33 ± 1.17 |
|
@@ -139,7 +137,7 @@ We use objective questions to assess eight dimensions: **Character, Style, Emoti
|
|
139 |
| Mistral-Nemo-Instruct-2407 | 74.12 ± 1.17 | 77.04 ± 1.48 | 17.00 ± 0.43 | 13.50 ± 0.67 | 67.00 ± 4.30 | 77.53 ± 1.61 | 53.67 ± 4.66 | 82.67 ± 4.77 | 74.33 ± 3.77 |
|
140 |
| Qwen2-7B-Instruct | 75.39 ± 1.13 | 77.68 ± 1.65 | 17.64 ± 0.56 | 13.43 ± 0.7 | 67.75 ± 4.44| 77.95 ± 1.70 | 48.00 ± 4.66 | 83.33 ± 4.48 | 99.00 ± 0.56 |
|
141 |
| **Qwen2-7B-BD-RP** | **78.67 ± 1.12***| **82.52 ± 1.33***| **15.68 ± 0.5*** | **11.22 ± 0.72***| **69.67 ± 4.27**| **80.79 ± 1.59***| **64.33 ± 3.80*** | **87.33 ± 3.74** | **99.00 ± 0.56**|
|
142 |
-
|
143 |
|
144 |
## Citation 📖
|
145 |
|
@@ -150,7 +148,9 @@ We use objective questions to assess eight dimensions: **Character, Style, Emoti
|
|
150 |
title = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
|
151 |
author = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
|
152 |
year = {2024},
|
153 |
-
journal = {arXiv preprint arXiv:2408.10903
|
|
|
|
|
154 |
}
|
155 |
```
|
156 |
|
|
|
117 |
|
118 |
We use objective questions to assess eight dimensions: **Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency**. The metric design can be find in our [paper](https://arxiv.org/abs/2408.10903). The evaluation code can be found in [GitHub](https://github.com/yuyouyu32/BeyondDialogue/tree/main/AutoRPEval). The results are shown below:
|
119 |
|
|
|
|
|
120 |
| **Model** | **Character ↑** | **Style ↑** | **Emotion ↓** | **Relationship ↓** | **Personality ↑** | **Avg. ↑** | **Human-likeness ↑** | **Role Choice ↑** | **Coherence ↑** |
|
121 |
+
|------------------------------------------------------------|---------------|---------------|---------------|------------------|-----------------|----------------|----------------------|-------------------|-----------------|
|
122 |
| **General Baselines(Proprietary)** |
|
123 |
| GPT-4o | 74.32 ± 1.15 | **81.67 ± 1.51** | 16.31 ± 0.48 | **12.13 ± 0.66** | 66.58 ± 4.41 | 78.83 ± 1.64 | **67.33 ± 3.95** | **87.33 ± 3.86** | **99.67 ± 0.33**|
|
124 |
| GPT-3.5-Turbo | 72.26 ± 1.27 | 73.66 ± 1.73 | 17.79 ± 0.56 | 14.17 ± 0.73 | 66.92 ± 4.85 | 76.18 ± 1.83 | 33.33 ± 4.43 | 83.00 ± 4.68 | 97.33 ± 1.17 |
|
|
|
137 |
| Mistral-Nemo-Instruct-2407 | 74.12 ± 1.17 | 77.04 ± 1.48 | 17.00 ± 0.43 | 13.50 ± 0.67 | 67.00 ± 4.30 | 77.53 ± 1.61 | 53.67 ± 4.66 | 82.67 ± 4.77 | 74.33 ± 3.77 |
|
138 |
| Qwen2-7B-Instruct | 75.39 ± 1.13 | 77.68 ± 1.65 | 17.64 ± 0.56 | 13.43 ± 0.7 | 67.75 ± 4.44| 77.95 ± 1.70 | 48.00 ± 4.66 | 83.33 ± 4.48 | 99.00 ± 0.56 |
|
139 |
| **Qwen2-7B-BD-RP** | **78.67 ± 1.12***| **82.52 ± 1.33***| **15.68 ± 0.5*** | **11.22 ± 0.72***| **69.67 ± 4.27**| **80.79 ± 1.59***| **64.33 ± 3.80*** | **87.33 ± 3.74** | **99.00 ± 0.56**|
|
140 |
+
|
141 |
|
142 |
## Citation 📖
|
143 |
|
|
|
148 |
title = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
|
149 |
author = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
|
150 |
year = {2024},
|
151 |
+
journal = {arXiv preprint arXiv:2408.10903
|
152 |
+
|
153 |
+
,
|
154 |
}
|
155 |
```
|
156 |
|