wilsontam commited on
Commit
2a47012
1 Parent(s): 663e0d2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "zh"
3
+ tags:
4
+ - bert-base-chinese
5
+ - Chinese dialogue
6
+ widget:
7
+ - text: "[CLS] 福州宠物医院哪家好呢 [eos] 我的喵在善化坊那边的精灵仁爱医院包括绝育驱虫咳嗽之类的东西 [SEP]"
8
+ ---
9
+ This is a model post trained using the following multi-turn Chinese dialogue corpora (only the training set portions defined in the literature):
10
+ * Douban
11
+ * E-commerce
12
+ * Restore-200k
13
+
14
+ The criteria to minimize are masked LM and next sentence prediction (3 category labels: 0 (random response from corpora), 1 (random response within a dialogue context), 2 (correct next response)).
15
+
16
+ If you want to use this model to encode a multiple-turn dialogue, the format is "[CLS] turn t-2 [eos] turn t-1 [SEP] turn t [SEP]" where tokens before and include the first SEP token is considred as segment 0. and any tokens after it is considered as segment 1. This is similar to training the NSP in Bert. Here we use a newly introdudced token [eos] to separate between different turns.
17
+
18
+ ---