File size: 2,541 Bytes
406b9af
 
 
 
 
 
9080011
5ff2bb8
 
4b173fd
 
 
 
 
5ff2bb8
 
 
 
 
 
 
3c8e47d
5ff2bb8
84dc62a
 
5ff2bb8
 
 
cce381e
 
 
 
 
5ff2bb8
 
 
 
 
 
 
 
 
 
 
 
cce381e
 
3bd9c83
cce381e
 
 
 
 
 
 
 
 
 
 
 
bebabb2
cce381e
 
09a268a
 
 
cce381e
 
 
09a268a
cce381e
09a268a
 
cce381e
9080011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
language:
- en
pipeline_tag: conversational
tags:
- pytorch
license: cc-by-nc-4.0
---

[blenderbot-1B-distill](https://huggingface.co/facebook/blenderbot-1B-distill) fine-tuned on the [ESConv dataset](https://github.com/thu-coai/Emotional-Support-Conversation) and [**AugESC dataset**](https://github.com/thu-coai/AugESC). 

See the [original paper](https://arxiv.org/abs/2202.13047) for details. 

Usage example:

```python
import torch
from transformers import AutoTokenizer
from transformers.models.blenderbot import BlenderbotTokenizer, BlenderbotForConditionalGeneration

def _norm(x):
    return ' '.join(x.strip().split())

tokenizer = BlenderbotTokenizer.from_pretrained('thu-coai/blenderbot-1B-augesc')
model = BlenderbotForConditionalGeneration.from_pretrained('thu-coai/blenderbot-1B-augesc')
model.eval()

utterances = [
  "I am having a lot of anxiety about quitting my current job. It is too stressful but pays well",
  "What makes your job stressful for you?",
  "I have to deal with many people in hard financial situations and it is upsetting",
  "Do you help your clients to make it to a better financial situation?",
  "I do, but often they are not going to get back to what they want. Many people are going to lose their home when safeguards are lifted",
]
input_sequence = ' '.join([' ' + e for e in utterances]) + tokenizer.eos_token # add space prefix and separate utterances with two spaces
input_ids = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(input_sequence))[-128:]
input_ids = torch.LongTensor([input_ids])

model_output = model.generate(input_ids, num_beams=1, do_sample=True, top_p=0.9, num_return_sequences=5, return_dict=False)
generation = tokenizer.batch_decode(model_output, skip_special_tokens=True)
generation = [_norm(e) for e in generation]
print(generation)

utterances.append(generation[0]) # for future loop
```


Please kindly cite our papers if you use this model:

```bib
@inproceedings{liu-etal-2021-towards,
  title={Towards Emotional Support Dialog Systems},
  author={Liu, Siyang  and 
    Zheng, Chujie  and 
    Demasi, Orianna  and 
    Sabour, Sahand  and 
    Li, Yu  and 
    Yu, Zhou  and 
    Jiang, Yong  and 
    Huang, Minlie},
  booktitle={ACL},
  year={2021}
}

@inproceedings{zheng-etal-2023-augesc,
  title={AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation},
  author={Zheng, Chujie and
    Sabour, Sahand and
    Wen, Jiaxin and
    Zhang, Zheng and
    Huang, Minlie},
  booktitle={Findings of ACL},
  year={2023}
}
```