KoichiYasuoka commited on
Commit
315f556
1 Parent(s): 0a36c24

initial release

Browse files
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "en"
4
+ tags:
5
+ - "english"
6
+ - "token-classification"
7
+ - "pos"
8
+ - "dependency-parsing"
9
+ datasets:
10
+ - "universal_dependencies"
11
+ license: "cc-by-sa-4.0"
12
+ pipeline_tag: "token-classification"
13
+ ---
14
+
15
+ # roberta-base-english-upos
16
+
17
+ ## Model Description
18
+
19
+ This is a RoBERTa model pre-trained with [UD_English-EWT](https://github.com/UniversalDependencies/UD_English-EWT) for POS-tagging and dependency-parsing, derived from [roberta-base](https://huggingface.co/roberta-base). Every word is tagged by [UPOS](https://universaldependencies.org/u/pos/) (Universal Part-Of-Speech).
20
+
21
+ ## How to Use
22
+
23
+ ```py
24
+ from transformers import AutoTokenizer,AutoModelForTokenClassification
25
+ tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-base-english-upos")
26
+ model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/roberta-base-english-upos")
27
+ ```
28
+
29
+ or
30
+
31
+ ```py
32
+ import esupar
33
+ nlp=esupar.load("KoichiYasuoka/roberta-base-english-upos")
34
+ ```
35
+
36
+ ## See Also
37
+
38
+ [esupar](https://github.com/KoichiYasuoka/esupar): Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa models
39
+
config.json ADDED
@@ -0,0 +1,1500 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "ADJ",
14
+ "1": "ADP",
15
+ "2": "ADP+PRON",
16
+ "3": "ADV",
17
+ "4": "ADV+AUX",
18
+ "5": "ADV+PART",
19
+ "6": "AUX",
20
+ "7": "AUX+PART",
21
+ "8": "B-ADJ",
22
+ "9": "B-ADJ+ADJ",
23
+ "10": "B-ADJ+NOUN",
24
+ "11": "B-ADJ+NOUN+NOUN",
25
+ "12": "B-ADJ+PART",
26
+ "13": "B-ADJ+PROPN",
27
+ "14": "B-ADJ+PUNCT",
28
+ "15": "B-ADP",
29
+ "16": "B-ADP+NOUN",
30
+ "17": "B-ADV",
31
+ "18": "B-ADV+AUX",
32
+ "19": "B-ADV+PUNCT",
33
+ "20": "B-AUX",
34
+ "21": "B-AUX+PART",
35
+ "22": "B-AUX+PART+VERB",
36
+ "23": "B-AUX+VERB",
37
+ "24": "B-CCONJ",
38
+ "25": "B-DET",
39
+ "26": "B-DET+AUX",
40
+ "27": "B-DET+NOUN",
41
+ "28": "B-INTJ",
42
+ "29": "B-INTJ+PUNCT",
43
+ "30": "B-NOUN",
44
+ "31": "B-NOUN+AUX",
45
+ "32": "B-NOUN+NOUN",
46
+ "33": "B-NOUN+NOUN+VERB",
47
+ "34": "B-NOUN+PART",
48
+ "35": "B-NOUN+PUNCT",
49
+ "36": "B-NOUN+VERB",
50
+ "37": "B-NUM",
51
+ "38": "B-PART",
52
+ "39": "B-PRON",
53
+ "40": "B-PRON+ADJ",
54
+ "41": "B-PRON+ADV",
55
+ "42": "B-PRON+AUX",
56
+ "43": "B-PRON+PART",
57
+ "44": "B-PRON+VERB",
58
+ "45": "B-PROPN",
59
+ "46": "B-PROPN+PART",
60
+ "47": "B-PROPN+PROPN",
61
+ "48": "B-PROPN+PUNCT",
62
+ "49": "B-PUNCT",
63
+ "50": "B-PUNCT+PUNCT",
64
+ "51": "B-PUNCT+PUNCT+PUNCT",
65
+ "52": "B-SCONJ",
66
+ "53": "B-SYM",
67
+ "54": "B-VERB",
68
+ "55": "B-VERB+ADJ",
69
+ "56": "B-VERB+ADJ+CCONJ",
70
+ "57": "B-VERB+ADP",
71
+ "58": "B-VERB+ADV",
72
+ "59": "B-VERB+DET",
73
+ "60": "B-VERB+NOUN",
74
+ "61": "B-VERB+NOUN+NOUN",
75
+ "62": "B-VERB+PART",
76
+ "63": "B-VERB+PRON",
77
+ "64": "B-VERB+SCONJ",
78
+ "65": "B-X",
79
+ "66": "B-X+PUNCT",
80
+ "67": "B-X+X",
81
+ "68": "B-X+X+PRON",
82
+ "69": "CCONJ",
83
+ "70": "DET",
84
+ "71": "DET+NUM",
85
+ "72": "I-ADJ",
86
+ "73": "I-ADJ+ADJ",
87
+ "74": "I-ADJ+NOUN",
88
+ "75": "I-ADJ+NOUN+NOUN",
89
+ "76": "I-ADJ+PART",
90
+ "77": "I-ADJ+PROPN",
91
+ "78": "I-ADJ+PUNCT",
92
+ "79": "I-ADP",
93
+ "80": "I-ADP+NOUN",
94
+ "81": "I-ADV",
95
+ "82": "I-ADV+AUX",
96
+ "83": "I-ADV+PUNCT",
97
+ "84": "I-AUX",
98
+ "85": "I-AUX+PART",
99
+ "86": "I-AUX+PART+VERB",
100
+ "87": "I-AUX+VERB",
101
+ "88": "I-CCONJ",
102
+ "89": "I-DET",
103
+ "90": "I-DET+AUX",
104
+ "91": "I-DET+NOUN",
105
+ "92": "I-INTJ",
106
+ "93": "I-INTJ+PUNCT",
107
+ "94": "I-NOUN",
108
+ "95": "I-NOUN+AUX",
109
+ "96": "I-NOUN+NOUN",
110
+ "97": "I-NOUN+NOUN+VERB",
111
+ "98": "I-NOUN+PART",
112
+ "99": "I-NOUN+PUNCT",
113
+ "100": "I-NOUN+VERB",
114
+ "101": "I-NUM",
115
+ "102": "I-PART",
116
+ "103": "I-PRON",
117
+ "104": "I-PRON+ADJ",
118
+ "105": "I-PRON+ADV",
119
+ "106": "I-PRON+AUX",
120
+ "107": "I-PRON+PART",
121
+ "108": "I-PRON+VERB",
122
+ "109": "I-PROPN",
123
+ "110": "I-PROPN+PART",
124
+ "111": "I-PROPN+PROPN",
125
+ "112": "I-PROPN+PUNCT",
126
+ "113": "I-PUNCT",
127
+ "114": "I-PUNCT+PUNCT",
128
+ "115": "I-PUNCT+PUNCT+PUNCT",
129
+ "116": "I-SCONJ",
130
+ "117": "I-SYM",
131
+ "118": "I-VERB",
132
+ "119": "I-VERB+ADJ",
133
+ "120": "I-VERB+ADJ+CCONJ",
134
+ "121": "I-VERB+ADP",
135
+ "122": "I-VERB+ADV",
136
+ "123": "I-VERB+DET",
137
+ "124": "I-VERB+NOUN",
138
+ "125": "I-VERB+NOUN+NOUN",
139
+ "126": "I-VERB+PART",
140
+ "127": "I-VERB+PRON",
141
+ "128": "I-VERB+SCONJ",
142
+ "129": "I-X",
143
+ "130": "I-X+PUNCT",
144
+ "131": "I-X+X",
145
+ "132": "I-X+X+PRON",
146
+ "133": "INTJ",
147
+ "134": "NOUN",
148
+ "135": "NOUN+AUX",
149
+ "136": "NOUN+PART",
150
+ "137": "NUM",
151
+ "138": "PART",
152
+ "139": "PRON",
153
+ "140": "PRON+AUX",
154
+ "141": "PRON+VERB",
155
+ "142": "PROPN",
156
+ "143": "PROPN+PART",
157
+ "144": "PUNCT",
158
+ "145": "PUNCT+PUNCT",
159
+ "146": "PUNCT+PUNCT+PUNCT",
160
+ "147": "PUNCT+SYM",
161
+ "148": "SCONJ",
162
+ "149": "SYM",
163
+ "150": "SYM+PUNCT",
164
+ "151": "SYM+SYM",
165
+ "152": "VERB",
166
+ "153": "VERB+ADP",
167
+ "154": "VERB+PART",
168
+ "155": "VERB+PRON",
169
+ "156": "X"
170
+ },
171
+ "initializer_range": 0.02,
172
+ "intermediate_size": 3072,
173
+ "label2id": {
174
+ "ADJ": 0,
175
+ "ADP": 1,
176
+ "ADP+PRON": 2,
177
+ "ADV": 3,
178
+ "ADV+AUX": 4,
179
+ "ADV+PART": 5,
180
+ "AUX": 6,
181
+ "AUX+PART": 7,
182
+ "B-ADJ": 8,
183
+ "B-ADJ+ADJ": 9,
184
+ "B-ADJ+NOUN": 10,
185
+ "B-ADJ+NOUN+NOUN": 11,
186
+ "B-ADJ+PART": 12,
187
+ "B-ADJ+PROPN": 13,
188
+ "B-ADJ+PUNCT": 14,
189
+ "B-ADP": 15,
190
+ "B-ADP+NOUN": 16,
191
+ "B-ADV": 17,
192
+ "B-ADV+AUX": 18,
193
+ "B-ADV+PUNCT": 19,
194
+ "B-AUX": 20,
195
+ "B-AUX+PART": 21,
196
+ "B-AUX+PART+VERB": 22,
197
+ "B-AUX+VERB": 23,
198
+ "B-CCONJ": 24,
199
+ "B-DET": 25,
200
+ "B-DET+AUX": 26,
201
+ "B-DET+NOUN": 27,
202
+ "B-INTJ": 28,
203
+ "B-INTJ+PUNCT": 29,
204
+ "B-NOUN": 30,
205
+ "B-NOUN+AUX": 31,
206
+ "B-NOUN+NOUN": 32,
207
+ "B-NOUN+NOUN+VERB": 33,
208
+ "B-NOUN+PART": 34,
209
+ "B-NOUN+PUNCT": 35,
210
+ "B-NOUN+VERB": 36,
211
+ "B-NUM": 37,
212
+ "B-PART": 38,
213
+ "B-PRON": 39,
214
+ "B-PRON+ADJ": 40,
215
+ "B-PRON+ADV": 41,
216
+ "B-PRON+AUX": 42,
217
+ "B-PRON+PART": 43,
218
+ "B-PRON+VERB": 44,
219
+ "B-PROPN": 45,
220
+ "B-PROPN+PART": 46,
221
+ "B-PROPN+PROPN": 47,
222
+ "B-PROPN+PUNCT": 48,
223
+ "B-PUNCT": 49,
224
+ "B-PUNCT+PUNCT": 50,
225
+ "B-PUNCT+PUNCT+PUNCT": 51,
226
+ "B-SCONJ": 52,
227
+ "B-SYM": 53,
228
+ "B-VERB": 54,
229
+ "B-VERB+ADJ": 55,
230
+ "B-VERB+ADJ+CCONJ": 56,
231
+ "B-VERB+ADP": 57,
232
+ "B-VERB+ADV": 58,
233
+ "B-VERB+DET": 59,
234
+ "B-VERB+NOUN": 60,
235
+ "B-VERB+NOUN+NOUN": 61,
236
+ "B-VERB+PART": 62,
237
+ "B-VERB+PRON": 63,
238
+ "B-VERB+SCONJ": 64,
239
+ "B-X": 65,
240
+ "B-X+PUNCT": 66,
241
+ "B-X+X": 67,
242
+ "B-X+X+PRON": 68,
243
+ "CCONJ": 69,
244
+ "DET": 70,
245
+ "DET+NUM": 71,
246
+ "I-ADJ": 72,
247
+ "I-ADJ+ADJ": 73,
248
+ "I-ADJ+NOUN": 74,
249
+ "I-ADJ+NOUN+NOUN": 75,
250
+ "I-ADJ+PART": 76,
251
+ "I-ADJ+PROPN": 77,
252
+ "I-ADJ+PUNCT": 78,
253
+ "I-ADP": 79,
254
+ "I-ADP+NOUN": 80,
255
+ "I-ADV": 81,
256
+ "I-ADV+AUX": 82,
257
+ "I-ADV+PUNCT": 83,
258
+ "I-AUX": 84,
259
+ "I-AUX+PART": 85,
260
+ "I-AUX+PART+VERB": 86,
261
+ "I-AUX+VERB": 87,
262
+ "I-CCONJ": 88,
263
+ "I-DET": 89,
264
+ "I-DET+AUX": 90,
265
+ "I-DET+NOUN": 91,
266
+ "I-INTJ": 92,
267
+ "I-INTJ+PUNCT": 93,
268
+ "I-NOUN": 94,
269
+ "I-NOUN+AUX": 95,
270
+ "I-NOUN+NOUN": 96,
271
+ "I-NOUN+NOUN+VERB": 97,
272
+ "I-NOUN+PART": 98,
273
+ "I-NOUN+PUNCT": 99,
274
+ "I-NOUN+VERB": 100,
275
+ "I-NUM": 101,
276
+ "I-PART": 102,
277
+ "I-PRON": 103,
278
+ "I-PRON+ADJ": 104,
279
+ "I-PRON+ADV": 105,
280
+ "I-PRON+AUX": 106,
281
+ "I-PRON+PART": 107,
282
+ "I-PRON+VERB": 108,
283
+ "I-PROPN": 109,
284
+ "I-PROPN+PART": 110,
285
+ "I-PROPN+PROPN": 111,
286
+ "I-PROPN+PUNCT": 112,
287
+ "I-PUNCT": 113,
288
+ "I-PUNCT+PUNCT": 114,
289
+ "I-PUNCT+PUNCT+PUNCT": 115,
290
+ "I-SCONJ": 116,
291
+ "I-SYM": 117,
292
+ "I-VERB": 118,
293
+ "I-VERB+ADJ": 119,
294
+ "I-VERB+ADJ+CCONJ": 120,
295
+ "I-VERB+ADP": 121,
296
+ "I-VERB+ADV": 122,
297
+ "I-VERB+DET": 123,
298
+ "I-VERB+NOUN": 124,
299
+ "I-VERB+NOUN+NOUN": 125,
300
+ "I-VERB+PART": 126,
301
+ "I-VERB+PRON": 127,
302
+ "I-VERB+SCONJ": 128,
303
+ "I-X": 129,
304
+ "I-X+PUNCT": 130,
305
+ "I-X+X": 131,
306
+ "I-X+X+PRON": 132,
307
+ "INTJ": 133,
308
+ "NOUN": 134,
309
+ "NOUN+AUX": 135,
310
+ "NOUN+PART": 136,
311
+ "NUM": 137,
312
+ "PART": 138,
313
+ "PRON": 139,
314
+ "PRON+AUX": 140,
315
+ "PRON+VERB": 141,
316
+ "PROPN": 142,
317
+ "PROPN+PART": 143,
318
+ "PUNCT": 144,
319
+ "PUNCT+PUNCT": 145,
320
+ "PUNCT+PUNCT+PUNCT": 146,
321
+ "PUNCT+SYM": 147,
322
+ "SCONJ": 148,
323
+ "SYM": 149,
324
+ "SYM+PUNCT": 150,
325
+ "SYM+SYM": 151,
326
+ "VERB": 152,
327
+ "VERB+ADP": 153,
328
+ "VERB+PART": 154,
329
+ "VERB+PRON": 155,
330
+ "X": 156
331
+ },
332
+ "layer_norm_eps": 1e-05,
333
+ "max_position_embeddings": 514,
334
+ "model_type": "roberta",
335
+ "num_attention_heads": 12,
336
+ "num_hidden_layers": 12,
337
+ "pad_token_id": 1,
338
+ "position_embedding_type": "absolute",
339
+ "task_specific_params": {
340
+ "upos_multiword": {
341
+ "ADJ+ADJ": {
342
+ "bigenough": [
343
+ "big",
344
+ "enough"
345
+ ],
346
+ "interestingsocial": [
347
+ "interesting",
348
+ "social"
349
+ ]
350
+ },
351
+ "ADJ+NOUN": {
352
+ "bigsource": [
353
+ "big",
354
+ "source"
355
+ ],
356
+ "distractingelements": [
357
+ "distracting",
358
+ "elements"
359
+ ],
360
+ "gruelingsanctions": [
361
+ "grueling",
362
+ "sanctions"
363
+ ],
364
+ "longexposures": [
365
+ "long",
366
+ "exposures"
367
+ ],
368
+ "longhair": [
369
+ "long",
370
+ "hair"
371
+ ],
372
+ "ongoingsummaries": [
373
+ "ongoing",
374
+ "summaries"
375
+ ],
376
+ "pre-meetingsite": [
377
+ "pre-meeting",
378
+ "site"
379
+ ],
380
+ "stronghints": [
381
+ "strong",
382
+ "hints"
383
+ ]
384
+ },
385
+ "ADJ+PART": {
386
+ "elses": [
387
+ "else",
388
+ "s"
389
+ ]
390
+ },
391
+ "ADJ+PROPN": {
392
+ "Nationwidetints": [
393
+ "Nationwide",
394
+ "tints"
395
+ ]
396
+ },
397
+ "ADJ+PUNCT": {
398
+ "e.g.:": [
399
+ "e.g.",
400
+ ":"
401
+ ],
402
+ "i.e.,": [
403
+ "i.e.",
404
+ ","
405
+ ]
406
+ },
407
+ "ADP+NOUN": {
408
+ "Infact": [
409
+ "In",
410
+ "fact"
411
+ ],
412
+ "overtime": [
413
+ "over",
414
+ "time"
415
+ ]
416
+ },
417
+ "ADP+PRON": {
418
+ "init": [
419
+ "in",
420
+ "it"
421
+ ]
422
+ },
423
+ "ADV+AUX": {
424
+ "Heres": [
425
+ "Here",
426
+ "s"
427
+ ]
428
+ },
429
+ "ADV+PART": {
430
+ "into": [
431
+ "in",
432
+ "to"
433
+ ]
434
+ },
435
+ "ADV+PUNCT": {
436
+ "E.g.,": [
437
+ "E.g.",
438
+ ","
439
+ ],
440
+ "i.e.,": [
441
+ "i.e.",
442
+ ","
443
+ ],
444
+ "i.e.:": [
445
+ "i.e.",
446
+ ":"
447
+ ]
448
+ },
449
+ "AUX+PART": {
450
+ "Aren't": [
451
+ "Are",
452
+ "n't"
453
+ ],
454
+ "CANT": [
455
+ "CA",
456
+ "NT"
457
+ ],
458
+ "Can't": [
459
+ "Ca",
460
+ "n't"
461
+ ],
462
+ "Cannot": [
463
+ "Can",
464
+ "not"
465
+ ],
466
+ "DON'T": [
467
+ "DO",
468
+ "N'T"
469
+ ],
470
+ "DONT": [
471
+ "DO",
472
+ "NT"
473
+ ],
474
+ "Don't": [
475
+ "Do",
476
+ "n't"
477
+ ],
478
+ "Dont": [
479
+ "Do",
480
+ "nt"
481
+ ],
482
+ "Haven't": [
483
+ "Have",
484
+ "n't"
485
+ ],
486
+ "ain't": [
487
+ "ai",
488
+ "n't"
489
+ ],
490
+ "aint": [
491
+ "ai",
492
+ "nt"
493
+ ],
494
+ "aren't": [
495
+ "are",
496
+ "n't"
497
+ ],
498
+ "arent": [
499
+ "are",
500
+ "nt"
501
+ ],
502
+ "can't": [
503
+ "ca",
504
+ "n't"
505
+ ],
506
+ "cannot": [
507
+ "can",
508
+ "not"
509
+ ],
510
+ "cant": [
511
+ "ca",
512
+ "nt"
513
+ ],
514
+ "can\u2019t": [
515
+ "ca",
516
+ "n\u2019t"
517
+ ],
518
+ "didn't": [
519
+ "did",
520
+ "n't"
521
+ ],
522
+ "didn\u2019t": [
523
+ "did",
524
+ "n\u2019t"
525
+ ],
526
+ "doesn't": [
527
+ "does",
528
+ "n't"
529
+ ],
530
+ "don't": [
531
+ "do",
532
+ "n't"
533
+ ],
534
+ "dont": [
535
+ "do",
536
+ "nt"
537
+ ],
538
+ "don\u2019t": [
539
+ "do",
540
+ "n\u2019t"
541
+ ],
542
+ "haven't": [
543
+ "have",
544
+ "n't"
545
+ ],
546
+ "wasent": [
547
+ "wase",
548
+ "nt"
549
+ ],
550
+ "weren't": [
551
+ "were",
552
+ "n't"
553
+ ],
554
+ "weren\u2019t": [
555
+ "were",
556
+ "n\u2019t"
557
+ ],
558
+ "won't": [
559
+ "wo",
560
+ "n't"
561
+ ],
562
+ "wont": [
563
+ "wo",
564
+ "nt"
565
+ ],
566
+ "won\u2019t": [
567
+ "wo",
568
+ "n\u2019t"
569
+ ]
570
+ },
571
+ "AUX+PART+VERB": {
572
+ "dunno": [
573
+ "du",
574
+ "n",
575
+ "no"
576
+ ]
577
+ },
578
+ "AUX+VERB": {
579
+ "beingsaid": [
580
+ "being",
581
+ "said"
582
+ ],
583
+ "beingsent": [
584
+ "being",
585
+ "sent"
586
+ ],
587
+ "beingshipped": [
588
+ "being",
589
+ "shipped"
590
+ ],
591
+ "beingspoken": [
592
+ "being",
593
+ "spoken"
594
+ ],
595
+ "havingsaid": [
596
+ "having",
597
+ "said"
598
+ ]
599
+ },
600
+ "DET+AUX": {
601
+ "thes": [
602
+ "the",
603
+ "s"
604
+ ]
605
+ },
606
+ "DET+NOUN": {
607
+ "ALOT": [
608
+ "A",
609
+ "LOT"
610
+ ],
611
+ "Alot": [
612
+ "A",
613
+ "lot"
614
+ ],
615
+ "apart": [
616
+ "a",
617
+ "part"
618
+ ],
619
+ "awhile": [
620
+ "a",
621
+ "while"
622
+ ],
623
+ "sometime": [
624
+ "some",
625
+ "time"
626
+ ]
627
+ },
628
+ "DET+NUM": {
629
+ "everyone": [
630
+ "every",
631
+ "one"
632
+ ]
633
+ },
634
+ "INTJ+PUNCT": {
635
+ "ta',": [
636
+ "ta'",
637
+ ","
638
+ ]
639
+ },
640
+ "NOUN+AUX": {
641
+ "breathingshould": [
642
+ "breathing",
643
+ "should"
644
+ ],
645
+ "doghas": [
646
+ "dog",
647
+ "has"
648
+ ]
649
+ },
650
+ "NOUN+NOUN": {
651
+ "Drivingschool": [
652
+ "Driving",
653
+ "school"
654
+ ],
655
+ "counselingservices": [
656
+ "counseling",
657
+ "services"
658
+ ],
659
+ "datingservice": [
660
+ "dating",
661
+ "service"
662
+ ],
663
+ "doghouse": [
664
+ "dog",
665
+ "house"
666
+ ],
667
+ "drivingschool": [
668
+ "driving",
669
+ "school"
670
+ ],
671
+ "engineeringservices": [
672
+ "engineering",
673
+ "services"
674
+ ],
675
+ "kingsnake": [
676
+ "king",
677
+ "snake"
678
+ ],
679
+ "kingsnakes": [
680
+ "king",
681
+ "snakes"
682
+ ],
683
+ "lightingshowroom": [
684
+ "lighting",
685
+ "showroom"
686
+ ],
687
+ "mpgnumber": [
688
+ "mpg",
689
+ "number"
690
+ ],
691
+ "testingschedule": [
692
+ "testing",
693
+ "schedule"
694
+ ],
695
+ "towingservices": [
696
+ "towing",
697
+ "services"
698
+ ]
699
+ },
700
+ "NOUN+NOUN+VERB": {
701
+ "RecruitingMeetingscheduled": [
702
+ "Recruiting",
703
+ "Meeting",
704
+ "scheduled"
705
+ ]
706
+ },
707
+ "NOUN+PART": {
708
+ "DAUGHTERS": [
709
+ "DAUGHTER",
710
+ "S"
711
+ ],
712
+ "Kids": [
713
+ "Kid",
714
+ "s"
715
+ ],
716
+ "Mares": [
717
+ "Mare",
718
+ "s"
719
+ ],
720
+ "Smokers": [
721
+ "Smoker",
722
+ "s"
723
+ ],
724
+ "Travelers": [
725
+ "Traveler",
726
+ "s"
727
+ ],
728
+ "animals": [
729
+ "animal",
730
+ "s"
731
+ ],
732
+ "bachelors": [
733
+ "bachelor",
734
+ "s"
735
+ ],
736
+ "bakers": [
737
+ "baker",
738
+ "s"
739
+ ],
740
+ "beginners": [
741
+ "beginner",
742
+ "s"
743
+ ],
744
+ "bettas": [
745
+ "betta",
746
+ "s"
747
+ ],
748
+ "boys": [
749
+ "boy",
750
+ "s"
751
+ ],
752
+ "cars": [
753
+ "car",
754
+ "s"
755
+ ],
756
+ "cats": [
757
+ "cat",
758
+ "s"
759
+ ],
760
+ "dads": [
761
+ "dad",
762
+ "s"
763
+ ],
764
+ "doctors": [
765
+ "doctor",
766
+ "s"
767
+ ],
768
+ "dogs": [
769
+ "dog",
770
+ "s"
771
+ ],
772
+ "drivers": [
773
+ "driver",
774
+ "s"
775
+ ],
776
+ "friends": [
777
+ "friend",
778
+ "s"
779
+ ],
780
+ "grandmas": [
781
+ "grandma",
782
+ "s"
783
+ ],
784
+ "horses": [
785
+ "horse",
786
+ "s"
787
+ ],
788
+ "humans": [
789
+ "human",
790
+ "s"
791
+ ],
792
+ "males": [
793
+ "male",
794
+ "s"
795
+ ],
796
+ "manufacturers": [
797
+ "manufacturer",
798
+ "s"
799
+ ],
800
+ "mares": [
801
+ "mare",
802
+ "s"
803
+ ],
804
+ "nights": [
805
+ "night",
806
+ "s"
807
+ ],
808
+ "owners": [
809
+ "owner",
810
+ "s"
811
+ ],
812
+ "peoples": [
813
+ "people",
814
+ "s"
815
+ ],
816
+ "persons": [
817
+ "person",
818
+ "s"
819
+ ],
820
+ "scammers": [
821
+ "scammer",
822
+ "s"
823
+ ],
824
+ "sons": [
825
+ "son",
826
+ "s"
827
+ ],
828
+ "teams": [
829
+ "team",
830
+ "s"
831
+ ],
832
+ "todays": [
833
+ "today",
834
+ "s"
835
+ ],
836
+ "trainers": [
837
+ "trainer",
838
+ "s"
839
+ ],
840
+ "visitors": [
841
+ "visitor",
842
+ "s"
843
+ ],
844
+ "wits": [
845
+ "wit",
846
+ "s"
847
+ ],
848
+ "workers": [
849
+ "worker",
850
+ "s"
851
+ ]
852
+ },
853
+ "NOUN+PUNCT": {
854
+ "Fax.(": [
855
+ "Fax.",
856
+ "("
857
+ ],
858
+ "a.m.,": [
859
+ "a.m.",
860
+ ","
861
+ ],
862
+ "lb.,": [
863
+ "lb.",
864
+ ","
865
+ ],
866
+ "mins.,": [
867
+ "mins.",
868
+ ","
869
+ ],
870
+ "oz.,": [
871
+ "oz.",
872
+ ","
873
+ ],
874
+ "p.m.,": [
875
+ "p.m.",
876
+ ","
877
+ ]
878
+ },
879
+ "NOUN+VERB": {
880
+ "thingsounded": [
881
+ "thing",
882
+ "sounded"
883
+ ]
884
+ },
885
+ "PRON+ADJ": {
886
+ "everythingset": [
887
+ "everything",
888
+ "set"
889
+ ],
890
+ "somethingsuch": [
891
+ "something",
892
+ "such"
893
+ ]
894
+ },
895
+ "PRON+ADV": {
896
+ "somethingsometime": [
897
+ "something",
898
+ "sometime"
899
+ ]
900
+ },
901
+ "PRON+AUX": {
902
+ "ITS": [
903
+ "IT",
904
+ "S"
905
+ ],
906
+ "Im": [
907
+ "I",
908
+ "m"
909
+ ],
910
+ "Its": [
911
+ "It",
912
+ "s"
913
+ ],
914
+ "Whats": [
915
+ "What",
916
+ "s"
917
+ ],
918
+ "Your": [
919
+ "You",
920
+ "r"
921
+ ],
922
+ "hes": [
923
+ "he",
924
+ "s"
925
+ ],
926
+ "id": [
927
+ "i",
928
+ "d"
929
+ ],
930
+ "im": [
931
+ "i",
932
+ "m"
933
+ ],
934
+ "its": [
935
+ "it",
936
+ "s"
937
+ ],
938
+ "iv": [
939
+ "i",
940
+ "v"
941
+ ],
942
+ "ive": [
943
+ "i",
944
+ "ve"
945
+ ],
946
+ "thats": [
947
+ "that",
948
+ "s"
949
+ ],
950
+ "their": [
951
+ "thei",
952
+ "r"
953
+ ],
954
+ "there": [
955
+ "the",
956
+ "re"
957
+ ],
958
+ "ur": [
959
+ "u",
960
+ "r"
961
+ ],
962
+ "your": [
963
+ "you",
964
+ "r"
965
+ ]
966
+ },
967
+ "PRON+PART": {
968
+ "anyones": [
969
+ "anyone",
970
+ "s"
971
+ ]
972
+ },
973
+ "PRON+VERB": {
974
+ "Thats": [
975
+ "That",
976
+ "s"
977
+ ],
978
+ "Theres": [
979
+ "There",
980
+ "s"
981
+ ],
982
+ "everythingset": [
983
+ "everything",
984
+ "set"
985
+ ],
986
+ "iguz": [
987
+ "i",
988
+ "guz"
989
+ ],
990
+ "im": [
991
+ "i",
992
+ "m"
993
+ ],
994
+ "its": [
995
+ "it",
996
+ "s"
997
+ ],
998
+ "theres": [
999
+ "there",
1000
+ "s"
1001
+ ],
1002
+ "youthank": [
1003
+ "you",
1004
+ "thank"
1005
+ ]
1006
+ },
1007
+ "PROPN+PART": {
1008
+ "BJs": [
1009
+ "BJ",
1010
+ "s"
1011
+ ],
1012
+ "Chilis": [
1013
+ "Chili",
1014
+ "s"
1015
+ ],
1016
+ "Friscos": [
1017
+ "Frisco",
1018
+ "s"
1019
+ ],
1020
+ "Hams": [
1021
+ "Ham",
1022
+ "s"
1023
+ ],
1024
+ "Kobeys": [
1025
+ "Kobey",
1026
+ "s"
1027
+ ],
1028
+ "LWs": [
1029
+ "LW",
1030
+ "s"
1031
+ ],
1032
+ "Leonardos": [
1033
+ "Leonardo",
1034
+ "s"
1035
+ ],
1036
+ "Mortons": [
1037
+ "Morton",
1038
+ "s"
1039
+ ],
1040
+ "Travellers": [
1041
+ "Traveller",
1042
+ "s"
1043
+ ],
1044
+ "Valentines": [
1045
+ "Valentine",
1046
+ "s"
1047
+ ],
1048
+ "Years": [
1049
+ "Year",
1050
+ "s"
1051
+ ],
1052
+ "jacks": [
1053
+ "jack",
1054
+ "s"
1055
+ ]
1056
+ },
1057
+ "PROPN+PROPN": {
1058
+ "G&GAutomotive": [
1059
+ "G&G",
1060
+ "Automotive"
1061
+ ],
1062
+ "drivingschool": [
1063
+ "driving",
1064
+ "school"
1065
+ ]
1066
+ },
1067
+ "PROPN+PUNCT": {
1068
+ "B.,": [
1069
+ "B.",
1070
+ ","
1071
+ ],
1072
+ "D.C.,": [
1073
+ "D.C.",
1074
+ ","
1075
+ ],
1076
+ "Inc.\"": [
1077
+ "Inc.",
1078
+ "\""
1079
+ ],
1080
+ "M.,": [
1081
+ "M.",
1082
+ ","
1083
+ ],
1084
+ "N.O.?": [
1085
+ "N.O.",
1086
+ "?"
1087
+ ],
1088
+ "Que.,": [
1089
+ "Que.",
1090
+ ","
1091
+ ],
1092
+ "U.N.,": [
1093
+ "U.N.",
1094
+ ","
1095
+ ],
1096
+ "U.S.)": [
1097
+ "U.S.",
1098
+ ")"
1099
+ ],
1100
+ "U.S.-": [
1101
+ "U.S.",
1102
+ "-"
1103
+ ],
1104
+ "Va.-": [
1105
+ "Va.",
1106
+ "-"
1107
+ ]
1108
+ },
1109
+ "PUNCT+PUNCT": {
1110
+ "!\"": [
1111
+ "!",
1112
+ "\""
1113
+ ],
1114
+ "!)": [
1115
+ "!",
1116
+ ")"
1117
+ ],
1118
+ "\"(": [
1119
+ "\"",
1120
+ "("
1121
+ ],
1122
+ "\")": [
1123
+ "\"",
1124
+ ")"
1125
+ ],
1126
+ "\",": [
1127
+ "\"",
1128
+ ","
1129
+ ],
1130
+ "\"-": [
1131
+ "\"",
1132
+ "-"
1133
+ ],
1134
+ "\"...": [
1135
+ "\"",
1136
+ "..."
1137
+ ],
1138
+ "\":": [
1139
+ "\"",
1140
+ ":"
1141
+ ],
1142
+ "')": [
1143
+ "'",
1144
+ ")"
1145
+ ],
1146
+ "',": [
1147
+ "'",
1148
+ ","
1149
+ ],
1150
+ "(\"": [
1151
+ "(",
1152
+ "\""
1153
+ ],
1154
+ "(\"\"": [
1155
+ "(",
1156
+ "\"\""
1157
+ ],
1158
+ ")\"": [
1159
+ ")",
1160
+ "\""
1161
+ ],
1162
+ ")(": [
1163
+ ")",
1164
+ "("
1165
+ ],
1166
+ "),": [
1167
+ ")",
1168
+ ","
1169
+ ],
1170
+ ").": [
1171
+ ")",
1172
+ "."
1173
+ ],
1174
+ ")...": [
1175
+ ")",
1176
+ "..."
1177
+ ],
1178
+ "):": [
1179
+ ")",
1180
+ ":"
1181
+ ],
1182
+ ");": [
1183
+ ")",
1184
+ ";"
1185
+ ],
1186
+ "*,": [
1187
+ "*",
1188
+ ","
1189
+ ],
1190
+ ",\"": [
1191
+ ",",
1192
+ "\""
1193
+ ],
1194
+ ",'": [
1195
+ ",",
1196
+ "'"
1197
+ ],
1198
+ ",''": [
1199
+ ",",
1200
+ "''"
1201
+ ],
1202
+ ",...": [
1203
+ ",",
1204
+ "..."
1205
+ ],
1206
+ "-\"": [
1207
+ "-",
1208
+ "\""
1209
+ ],
1210
+ ".'": [
1211
+ ".",
1212
+ "'"
1213
+ ],
1214
+ "...\"": [
1215
+ "...",
1216
+ "\""
1217
+ ],
1218
+ "?\"": [
1219
+ "?",
1220
+ "\""
1221
+ ],
1222
+ "?)": [
1223
+ "?",
1224
+ ")"
1225
+ ],
1226
+ "?]": [
1227
+ "?",
1228
+ "]"
1229
+ ],
1230
+ "],": [
1231
+ "]",
1232
+ ","
1233
+ ]
1234
+ },
1235
+ "PUNCT+PUNCT+PUNCT": {
1236
+ "!),": [
1237
+ "!",
1238
+ ")",
1239
+ ","
1240
+ ],
1241
+ "\"),": [
1242
+ "\"",
1243
+ ")",
1244
+ ","
1245
+ ],
1246
+ "?),": [
1247
+ "?",
1248
+ ")",
1249
+ ","
1250
+ ]
1251
+ },
1252
+ "PUNCT+SYM": {
1253
+ "($": [
1254
+ "(",
1255
+ "$"
1256
+ ]
1257
+ },
1258
+ "SYM+PUNCT": {
1259
+ "$,": [
1260
+ "$",
1261
+ ","
1262
+ ],
1263
+ "%)": [
1264
+ "%",
1265
+ ")"
1266
+ ],
1267
+ "-'": [
1268
+ "-",
1269
+ "'"
1270
+ ]
1271
+ },
1272
+ "SYM+SYM": {
1273
+ "-$": [
1274
+ "-",
1275
+ "$"
1276
+ ]
1277
+ },
1278
+ "VERB+ADJ": {
1279
+ "doingshoddy": [
1280
+ "doing",
1281
+ "shoddy"
1282
+ ],
1283
+ "facingserious": [
1284
+ "facing",
1285
+ "serious"
1286
+ ],
1287
+ "outsourcingspecial": [
1288
+ "outsourcing",
1289
+ "special"
1290
+ ],
1291
+ "reinforcingsimilar": [
1292
+ "reinforcing",
1293
+ "similar"
1294
+ ]
1295
+ },
1296
+ "VERB+ADJ+CCONJ": {
1297
+ "lookingsmugand": [
1298
+ "looking",
1299
+ "smug",
1300
+ "and"
1301
+ ]
1302
+ },
1303
+ "VERB+ADP": {
1304
+ "Login": [
1305
+ "Log",
1306
+ "in"
1307
+ ],
1308
+ "gamingsince": [
1309
+ "gaming",
1310
+ "since"
1311
+ ],
1312
+ "goto": [
1313
+ "go",
1314
+ "to"
1315
+ ],
1316
+ "hummingsince": [
1317
+ "humming",
1318
+ "since"
1319
+ ],
1320
+ "investigatingsince": [
1321
+ "investigating",
1322
+ "since"
1323
+ ],
1324
+ "setup": [
1325
+ "set",
1326
+ "up"
1327
+ ]
1328
+ },
1329
+ "VERB+ADV": {
1330
+ "totalingsomewhere": [
1331
+ "totaling",
1332
+ "somewhere"
1333
+ ]
1334
+ },
1335
+ "VERB+DET": {
1336
+ "discussingsome": [
1337
+ "discussing",
1338
+ "some"
1339
+ ],
1340
+ "doingevery": [
1341
+ "doing",
1342
+ "every"
1343
+ ],
1344
+ "doingsome": [
1345
+ "doing",
1346
+ "some"
1347
+ ],
1348
+ "dumpingsome": [
1349
+ "dumping",
1350
+ "some"
1351
+ ],
1352
+ "experiencingsome": [
1353
+ "experiencing",
1354
+ "some"
1355
+ ],
1356
+ "meetingeach": [
1357
+ "meeting",
1358
+ "each"
1359
+ ],
1360
+ "readingsome": [
1361
+ "reading",
1362
+ "some"
1363
+ ],
1364
+ "regardingsome": [
1365
+ "regarding",
1366
+ "some"
1367
+ ],
1368
+ "replacingsome": [
1369
+ "replacing",
1370
+ "some"
1371
+ ]
1372
+ },
1373
+ "VERB+NOUN": {
1374
+ "doingscissors": [
1375
+ "doing",
1376
+ "scissors"
1377
+ ],
1378
+ "followingsuggestion": [
1379
+ "following",
1380
+ "suggestion"
1381
+ ],
1382
+ "formingeggs": [
1383
+ "forming",
1384
+ "eggs"
1385
+ ],
1386
+ "meaningshell": [
1387
+ "meaning",
1388
+ "shell"
1389
+ ],
1390
+ "playingsports": [
1391
+ "playing",
1392
+ "sports"
1393
+ ],
1394
+ "producingshrubs": [
1395
+ "producing",
1396
+ "shrubs"
1397
+ ],
1398
+ "providingservices": [
1399
+ "providing",
1400
+ "services"
1401
+ ],
1402
+ "quittingsmoking": [
1403
+ "quitting",
1404
+ "smoking"
1405
+ ]
1406
+ },
1407
+ "VERB+PART": {
1408
+ "Gotta": [
1409
+ "Got",
1410
+ "ta"
1411
+ ],
1412
+ "aren't": [
1413
+ "are",
1414
+ "n't"
1415
+ ],
1416
+ "doesn't": [
1417
+ "does",
1418
+ "n't"
1419
+ ],
1420
+ "don't": [
1421
+ "do",
1422
+ "n't"
1423
+ ],
1424
+ "gonna": [
1425
+ "gon",
1426
+ "na"
1427
+ ],
1428
+ "gotta": [
1429
+ "got",
1430
+ "ta"
1431
+ ],
1432
+ "wana": [
1433
+ "wan",
1434
+ "a"
1435
+ ],
1436
+ "wanna": [
1437
+ "wan",
1438
+ "na"
1439
+ ]
1440
+ },
1441
+ "VERB+PRON": {
1442
+ "Lets": [
1443
+ "Let",
1444
+ "s"
1445
+ ],
1446
+ "callyou": [
1447
+ "call",
1448
+ "you"
1449
+ ],
1450
+ "doingeverything": [
1451
+ "doing",
1452
+ "everything"
1453
+ ],
1454
+ "lets": [
1455
+ "let",
1456
+ "s"
1457
+ ]
1458
+ },
1459
+ "VERB+SCONJ": {
1460
+ "decidewhether": [
1461
+ "decide",
1462
+ "whether"
1463
+ ]
1464
+ },
1465
+ "X+PUNCT": {
1466
+ "etc.)": [
1467
+ "etc.",
1468
+ ")"
1469
+ ],
1470
+ "etc.,": [
1471
+ "etc.",
1472
+ ","
1473
+ ],
1474
+ "etc..": [
1475
+ "etc.",
1476
+ "."
1477
+ ]
1478
+ },
1479
+ "X+X": {
1480
+ ").doc": [
1481
+ ")",
1482
+ ".doc"
1483
+ ]
1484
+ },
1485
+ "X+X+PRON": {
1486
+ "http://i.imgur.com/T2zff.jpghttp://i.imgur.com/Xytex.jpgI": [
1487
+ "http://i.imgur.com/T2zff.jpg",
1488
+ "http://i.imgur.com/Xytex.jpg",
1489
+ "I"
1490
+ ]
1491
+ }
1492
+ }
1493
+ },
1494
+ "tokenizer_class": "RobertaTokenizerFast",
1495
+ "torch_dtype": "float32",
1496
+ "transformers_version": "4.11.3",
1497
+ "type_vocab_size": 1,
1498
+ "use_cache": true,
1499
+ "vocab_size": 50265
1500
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ad92b564b4a62c21032ff9e63d6e0822f95d94fe8e028707dd480f7cce0bc15
3
+ size 496790863
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
supar.model ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5166b493ed67370b1ee21ce283c31e3b145d34723e84fdb612a77e3665f0e53b
3
+ size 549334503
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "tokenizer_class": "RobertaTokenizerFast"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff