yuyouyu
/

Qwen2-7B-BD-RP

@@ -117,10 +117,8 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 We use objective questions to assess eight dimensions: **Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency**. The metric design can be find in our [paper](https://arxiv.org/abs/2408.10903). The evaluation code can be found in [GitHub](https://github.com/yuyouyu32/BeyondDialogue/tree/main/AutoRPEval). The results are shown below:
-<div style="overflow-x: auto;">
 | **Model**                    | **Character ↑** | **Style ↑**     | **Emotion ↓**   | **Relationship ↓** | **Personality ↑** | **Avg. ↑**     | **Human-likeness ↑** | **Role Choice ↑** | **Coherence ↑** |
-|------------------------------|---------------|---------------|---------------|------------------|-----------------|----------------|----------------------|-------------------|-----------------|
 | **General Baselines(Proprietary)**                                                                                                                                            |
 | GPT-4o                       | 74.32 ± 1.15  | **81.67 ± 1.51** | 16.31 ± 0.48  | **12.13 ± 0.66** | 66.58 ± 4.41    | 78.83 ± 1.64   | **67.33 ± 3.95**     | **87.33 ± 3.86**  | **99.67 ± 0.33**|
 | GPT-3.5-Turbo                | 72.26 ± 1.27  | 73.66 ± 1.73  | 17.79 ± 0.56  | 14.17 ± 0.73     | 66.92 ± 4.85    | 76.18 ± 1.83   | 33.33 ± 4.43         | 83.00 ± 4.68      | 97.33 ± 1.17    |
@@ -139,7 +137,7 @@ We use objective questions to assess eight dimensions: **Character, Style, Emoti
 | Mistral-Nemo-Instruct-2407   | 74.12 ± 1.17  | 77.04 ± 1.48  | 17.00 ± 0.43  | 13.50 ± 0.67     | 67.00 ± 4.30    | 77.53 ± 1.61   | 53.67 ± 4.66     | 82.67 ± 4.77      | 74.33 ± 3.77    |
 | Qwen2-7B-Instruct            | 75.39 ± 1.13 | 77.68 ± 1.65  | 17.64 ± 0.56  | 13.43 ± 0.7      | 67.75 ± 4.44| 77.95 ± 1.70   | 48.00 ± 4.66         | 83.33 ± 4.48      | 99.00 ± 0.56    |
 | **Qwen2-7B-BD-RP**          | **78.67 ± 1.12***| **82.52 ± 1.33***| **15.68 ± 0.5*** | **11.22 ± 0.72***| **69.67 ± 4.27**| **80.79 ± 1.59***| **64.33 ± 3.80***     | **87.33 ± 3.74**  | **99.00 ± 0.56**|
-</div>
 ## Citation 📖
@@ -150,7 +148,9 @@ We use objective questions to assess eight dimensions: **Character, Style, Emoti
   title   = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
   author  = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
   year    = {2024},
-  journal = {arXiv preprint arXiv:2408.10903},
 }
 ```

 We use objective questions to assess eight dimensions: **Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency**. The metric design can be find in our [paper](https://arxiv.org/abs/2408.10903). The evaluation code can be found in [GitHub](https://github.com/yuyouyu32/BeyondDialogue/tree/main/AutoRPEval). The results are shown below:
 | **Model**                    | **Character ↑** | **Style ↑**     | **Emotion ↓**   | **Relationship ↓** | **Personality ↑** | **Avg. ↑**     | **Human-likeness ↑** | **Role Choice ↑** | **Coherence ↑** |
+|------------------------------------------------------------|---------------|---------------|---------------|------------------|-----------------|----------------|----------------------|-------------------|-----------------|
 | **General Baselines(Proprietary)**                                                                                                                                            |
 | GPT-4o                       | 74.32 ± 1.15  | **81.67 ± 1.51** | 16.31 ± 0.48  | **12.13 ± 0.66** | 66.58 ± 4.41    | 78.83 ± 1.64   | **67.33 ± 3.95**     | **87.33 ± 3.86**  | **99.67 ± 0.33**|
 | GPT-3.5-Turbo                | 72.26 ± 1.27  | 73.66 ± 1.73  | 17.79 ± 0.56  | 14.17 ± 0.73     | 66.92 ± 4.85    | 76.18 ± 1.83   | 33.33 ± 4.43         | 83.00 ± 4.68      | 97.33 ± 1.17    |
 | Mistral-Nemo-Instruct-2407   | 74.12 ± 1.17  | 77.04 ± 1.48  | 17.00 ± 0.43  | 13.50 ± 0.67     | 67.00 ± 4.30    | 77.53 ± 1.61   | 53.67 ± 4.66     | 82.67 ± 4.77      | 74.33 ± 3.77    |
 | Qwen2-7B-Instruct            | 75.39 ± 1.13 | 77.68 ± 1.65  | 17.64 ± 0.56  | 13.43 ± 0.7      | 67.75 ± 4.44| 77.95 ± 1.70   | 48.00 ± 4.66         | 83.33 ± 4.48      | 99.00 ± 0.56    |
 | **Qwen2-7B-BD-RP**          | **78.67 ± 1.12***| **82.52 ± 1.33***| **15.68 ± 0.5*** | **11.22 ± 0.72***| **69.67 ± 4.27**| **80.79 ± 1.59***| **64.33 ± 3.80***     | **87.33 ± 3.74**  | **99.00 ± 0.56**|
 ## Citation 📖
   title   = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
   author  = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
   year    = {2024},
+  journal = {arXiv preprint arXiv:2408.10903
+        ,
 }
 ```