pangpang666
commited on
Commit
·
6435dcf
1
Parent(s):
2fe7c76
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This model provides a Chinese GPT-2 language model trained with SimCTG on the LCCC benchmark [(Wang et al. 2020)](https://arxiv.org/pdf/2008.03946v2.pdf) based on our paper [_A Contrastive Framework for Neural Text Generation_](https://arxiv.org/abs/2202.06417).
|
2 |
+
|
3 |
+
We provide a detailed tutorial on how to apply SimCTG and Contrastive Search in our [project repo](https://github.com/yxuansu/SimCTG#4-huggingface-style-tutorials-back-to-top). In the following, we illustrate a brief tutorial on how to use our approach to perform text generation.
|
4 |
+
|
5 |
+
## 1. Installation of SimCTG:
|
6 |
+
```yaml
|
7 |
+
pip install simctg --upgrade
|
8 |
+
```
|
9 |
+
|
10 |
+
## 2. Initialize SimCTG Model:
|
11 |
+
```python
|
12 |
+
import torch
|
13 |
+
# load SimCTG language model
|
14 |
+
from simctg.simctggpt import SimCTGGPT
|
15 |
+
model_name = r'cambridgeltl/simctg_lccc_dialogue'
|
16 |
+
model = SimCTGGPT(model_name)
|
17 |
+
model.eval()
|
18 |
+
tokenizer = model.tokenizer
|
19 |
+
eos_token = '[SEP]'
|
20 |
+
eos_token_id = tokenizer.convert_tokens_to_ids([eos_token])[0]
|
21 |
+
```
|
22 |
+
|
23 |
+
## 3. Prepare the Text Prefix:
|
24 |
+
```python
|
25 |
+
context_list = ['刺猬很可爱!以前别人送了只没养,味儿太大!', '是很可爱但是非常臭', '是啊,没办法养', '那个怎么养哦不会扎手吗']
|
26 |
+
prefix_text = eos_token.join(context_list).strip(eos_token) + eos_token
|
27 |
+
print ('Prefix is: {}'.format(prefix_text))
|
28 |
+
tokens = tokenizer.tokenize(prefix_text)
|
29 |
+
input_ids = tokenizer.convert_tokens_to_ids(tokens)
|
30 |
+
input_ids = torch.LongTensor(input_ids).view(1,-1)
|
31 |
+
```
|
32 |
+
|
33 |
+
## 4. Generate Text with Contrastive Search:
|
34 |
+
```python
|
35 |
+
beam_width, alpha, decoding_len = 5, 0.6, 64
|
36 |
+
output = model.fast_contrastive_search(input_ids=input_ids, beam_width=beam_width, alpha=alpha,
|
37 |
+
decoding_len=decoding_len, end_of_sequence_token_id=eos_token_id,
|
38 |
+
early_stop=True)
|
39 |
+
|
40 |
+
print("Output:\n" + 100 * '-')
|
41 |
+
print(''.join(tokenizer.decode(output)))
|
42 |
+
'''
|
43 |
+
Prefix is: 刺猬很可爱!以前别人送了只没养,味儿太大![SEP]是很可爱但是非常臭[SEP]是啊,没办法养[SEP]那个怎么养哦不会扎手吗[SEP]
|
44 |
+
Output:
|
45 |
+
----------------------------------------------------------------------------------------------------
|
46 |
+
刺猬很可爱!以前别人送了只没养,味儿太大![SEP]是很可爱但是非常臭[SEP]是啊,没办法养[SEP]那个怎么养哦不会扎手吗[SEP]我觉得还好,就是有点臭
|
47 |
+
'''
|
48 |
+
```
|
49 |
+
|
50 |
+
For more details of our work, please refer to our main [project repo](https://github.com/yxuansu/SimCTG).
|
51 |
+
|
52 |
+
## 5. Citation:
|
53 |
+
If you find our paper and resources useful, please kindly leave a star and cite our paper. Thanks!
|
54 |
+
|
55 |
+
```bibtex
|
56 |
+
@article{su2022contrastive,
|
57 |
+
title={A Contrastive Framework for Neural Text Generation},
|
58 |
+
author={Su, Yixuan and Lan, Tian and Wang, Yan and Yogatama, Dani and Kong, Lingpeng and Collier, Nigel},
|
59 |
+
journal={arXiv preprint arXiv:2202.06417},
|
60 |
+
year={2022}
|
61 |
+
}
|
62 |
+
```
|
63 |
+
|
64 |
+
|