pangpang666 commited on
Commit
6435dcf
1 Parent(s): 2fe7c76

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This model provides a Chinese GPT-2 language model trained with SimCTG on the LCCC benchmark [(Wang et al. 2020)](https://arxiv.org/pdf/2008.03946v2.pdf) based on our paper [_A Contrastive Framework for Neural Text Generation_](https://arxiv.org/abs/2202.06417).
2
+
3
+ We provide a detailed tutorial on how to apply SimCTG and Contrastive Search in our [project repo](https://github.com/yxuansu/SimCTG#4-huggingface-style-tutorials-back-to-top). In the following, we illustrate a brief tutorial on how to use our approach to perform text generation.
4
+
5
+ ## 1. Installation of SimCTG:
6
+ ```yaml
7
+ pip install simctg --upgrade
8
+ ```
9
+
10
+ ## 2. Initialize SimCTG Model:
11
+ ```python
12
+ import torch
13
+ # load SimCTG language model
14
+ from simctg.simctggpt import SimCTGGPT
15
+ model_name = r'cambridgeltl/simctg_lccc_dialogue'
16
+ model = SimCTGGPT(model_name)
17
+ model.eval()
18
+ tokenizer = model.tokenizer
19
+ eos_token = '[SEP]'
20
+ eos_token_id = tokenizer.convert_tokens_to_ids([eos_token])[0]
21
+ ```
22
+
23
+ ## 3. Prepare the Text Prefix:
24
+ ```python
25
+ context_list = ['刺猬很可爱!以前别人送了只没养,味儿太大!', '是很可爱但是非常臭', '是啊,没办法养', '那个怎么养哦不会扎手吗']
26
+ prefix_text = eos_token.join(context_list).strip(eos_token) + eos_token
27
+ print ('Prefix is: {}'.format(prefix_text))
28
+ tokens = tokenizer.tokenize(prefix_text)
29
+ input_ids = tokenizer.convert_tokens_to_ids(tokens)
30
+ input_ids = torch.LongTensor(input_ids).view(1,-1)
31
+ ```
32
+
33
+ ## 4. Generate Text with Contrastive Search:
34
+ ```python
35
+ beam_width, alpha, decoding_len = 5, 0.6, 64
36
+ output = model.fast_contrastive_search(input_ids=input_ids, beam_width=beam_width, alpha=alpha,
37
+ decoding_len=decoding_len, end_of_sequence_token_id=eos_token_id,
38
+ early_stop=True)
39
+
40
+ print("Output:\n" + 100 * '-')
41
+ print(''.join(tokenizer.decode(output)))
42
+ '''
43
+ Prefix is: 刺猬很可爱!以前别人送了只没养,味儿太大![SEP]是很可爱但是非常臭[SEP]是啊,没办法养[SEP]那个怎么养哦不会扎手吗[SEP]
44
+ Output:
45
+ ----------------------------------------------------------------------------------------------------
46
+ 刺猬很可爱!以前别人送了只没养,味儿太大![SEP]是很可爱但是非常臭[SEP]是啊,没办法养[SEP]那个怎么养哦不会扎手吗[SEP]我觉得还好,就是有点臭
47
+ '''
48
+ ```
49
+
50
+ For more details of our work, please refer to our main [project repo](https://github.com/yxuansu/SimCTG).
51
+
52
+ ## 5. Citation:
53
+ If you find our paper and resources useful, please kindly leave a star and cite our paper. Thanks!
54
+
55
+ ```bibtex
56
+ @article{su2022contrastive,
57
+ title={A Contrastive Framework for Neural Text Generation},
58
+ author={Su, Yixuan and Lan, Tian and Wang, Yan and Yogatama, Dani and Kong, Lingpeng and Collier, Nigel},
59
+ journal={arXiv preprint arXiv:2202.06417},
60
+ year={2022}
61
+ }
62
+ ```
63
+
64
+