File size: 6,340 Bytes
7e59293
71f3ad4
7e59293
 
d0c4824
7e59293
 
 
d0c4824
7e59293
 
 
 
 
 
 
 
d0c4824
7e59293
 
 
 
 
 
d27b26a
 
 
 
7e59293
d27b26a
 
7e59293
 
 
14ddd0a
7e59293
 
 
 
 
 
 
 
 
 
 
 
 
 
0252c05
7e59293
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e5c3ba4
 
 
 
 
 
 
 
 
7e59293
 
e5c3ba4
7e59293
e5c3ba4
 
 
 
 
7e59293
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0252c05
7e59293
0252c05
7e59293
0252c05
7e59293
 
 
 
 
 
 
 
 
 
 
 
d0c4824
 
7e59293
 
 
 
f7aa7fc
d0c4824
f7aa7fc
d0c4824
f7aa7fc
d0c4824
7e59293
d0c4824
 
7e59293
 
 
 
71f3ad4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
pipeline_tag: text-generation
---

# Collective roleplay model

<!-- Provide a quick summary of what the model is/does. -->

Model developed by Collective AI for role-playing with learning from user interactions data.

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** [Collective AI](https://huggingface.co/collective-ai)
- **Model type:** llama3 based role-play model
- **Language(s):** Chinese, English
- **Finetuned from model:** [Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat)


## How to Get Started with the Model

Requirements
```
transformers>=4.40.2
```
Use the code below to get started with the model.


```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Collective-Ai/collective-v0.1-chinese-roleplay-8b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

name = "唐三"
gendar = "Male"
age = "19"
personality = "孤傲,冷漠,沉着,冷静"
relation_between_role_and_user = "同门师兄弟"
role_base_story = "《斗罗大陆》男主角。前世为唐门外门弟子,因偷学内门绝学《玄天宝录》,为唐门所不容,跳崖明志,却来到了另一个世界——斗罗大陆"
messages = [
    {"role": "system", "content": f"""#Role\nName: {name}\nGender: {gendar}\nLanguage: Chinese\nAge: {age}\nPersonality: {personality}\n\n#Relationship\n{relation_between_role_and_user}\n\n#Story\n{role_base_story}"""},
    {"role": "user", "content": "我来了,三哥"},
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=1.0,
    top_p=0.7,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


## Prompt 
### Huggingface inference API

<!-- This section is used for prompt online -->
To get better model experience, using the template below to assemble your prompt:

```python
def format_message(role, message):
    return f"<|start_header_id|>{role}<|end_header_id|>\n\n{message}<|eot_id|>"
# instruction:  角色人物卡 
# history:      历史对话
# query:        当前用户的提问
item = {
        'instruction':'#Role\nName: 唐三\nGender: Male\nLanguage: Chinese\nAge: 19\nPersonality: 孤傲,冷漠,沉着,冷静\n\n#Relationship\n同门师兄弟\n\n#Story\n《斗罗大陆》男主角。前世为唐门外门弟子,因偷学内门绝学《玄天宝录》,为唐门所不容,跳崖明志,却来到了另一个世界——斗罗大陆',
        'history':[{'role':'user','content':'我来了,三哥'},{'role':'assistant','content':'(唐三双手叉腰,显示一丝不悦)师弟,你怎么才来,我都等你很久了'}],
        'query':{'role':'user','content':'那我们快出发去见师父吧'},
        }
system = '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n'+item.get('instruction', '') + '<|eot_id|>' 
histories = item.get('history', [])
if histories == [] or histories == [{}]:
    history = ''
else:
    history = ''.join(format_message(hist['role'], hist['content']) for hist in histories)
query = item.get('query', [])
query  = format_message(query['role'], query['content']) + '<|start_header_id|>assistant<|end_header_id|>'
final_query = system + history + query
print(final_query)
```

The final prompt should look like this:(Since this is a role-playing model, your prompt should better includ role info and story)

```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
#Role
Name: 唐三
Gender: Male
Language: Chinese
Age: 19
Personality: 孤傲,冷漠,沉着,冷静

#Relationship
同门师兄弟

#Story
《斗罗大陆》男主角。前世为唐门外门弟子,因偷学内门绝学《玄天宝录》,为唐门所不容,跳崖明志,却来到了另一个世界——斗罗大陆<|eot_id|><|start_header_id|>user<|end_header_id|>

我来了,三哥<|eot_id|><|start_header_id|>assistant<|end_header_id|>

(唐三双手叉腰,显示一丝不悦)师弟,你怎么才来,我都等你很久了<|eot_id|><|start_header_id|>user<|end_header_id|>

那我们快出发去见师父吧<|eot_id|><|start_header_id|>assistant<|end_header_id|>


```



### Official API
<!-- This section is used for official api -->
Coming soon


## Improvement
Our model demonstrates around 150% improvement over ChatGPT in terms of average conversation length. It can extend the average dialog turns from 80 replies to as many as 200 turns or more, significantly improving user engagement.
  
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fc22b1ec1f99ad1f3501f4/ttA_3Qtlu1WKFloZ5-y6j.png)


Our chatbot model has several distinctive strengths:

**Rich Expressiveness**: The model is capable of conveying semantic and emotional nuances by utilizing detailed descriptions within the dialogue, such as speech expression, non-verbal cues, and character psychological portrayals, complementing the insufficiency of pure textual information.

**Proactivity**: The model can not only flexibly engage with users’ diverse inputs, but also proactively introduce new topics, which greatly improved user retention and depth of conversations. It's also less susceptible to repetitions.

**Strong Linguistic Foundations**: By leveraging massive high-quality Chinese corpora, the model can engage in diverse conversations based on different scenarios and character attributes, including genres like historical, campus, workplace, and fantasy, providing users with richer chat experiences.

Here are some example roleplay dialogues from our model:
  
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fc22b1ec1f99ad1f3501f4/mO36170XQrUz5lfJU1b81.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fc22b1ec1f99ad1f3501f4/DeJM3tos5lorHnEZiKNyp.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65fc22b1ec1f99ad1f3501f4/2yqroK9Bvb7KnlagRG87o.png)