KoichiYasuoka commited on
Commit
f3ef04d
1 Parent(s): b247fcb

initial release

Browse files
Files changed (8) hide show
  1. README.md +30 -0
  2. config.json +2076 -0
  3. maker.py +116 -0
  4. pytorch_model.bin +3 -0
  5. special_tokens_map.json +44 -0
  6. tokenizer.json +0 -0
  7. tokenizer_config.json +2065 -0
  8. ud.py +147 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "th"
4
+ tags:
5
+ - "thai"
6
+ - "pos"
7
+ - "dependency-parsing"
8
+ base_model: meta-llama/Llama-3.2-1B
9
+ datasets:
10
+ - "universal_dependencies"
11
+ license: "apache-2.0"
12
+ pipeline_tag: "token-classification"
13
+ widget:
14
+ - text: "หลายหัวดีกว่าหัวเดียว"
15
+ ---
16
+
17
+ # Llama-3.2-1B-thai-ud-causal
18
+
19
+ ## Model Description
20
+
21
+ This is a LLaMA model pre-trained for POS-tagging and dependency-parsing, derived from [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) refined for [Thai Universal Dependency Treebank](https://github.com/nlp-chula/TUD).
22
+
23
+ ## How to Use
24
+
25
+ ```py
26
+ from transformers import pipeline
27
+ nlp=pipeline("universal-dependencies","KoichiYasuoka/Llama-3.2-1B-thai-ud-causal",trust_remote_code=True)
28
+ print(nlp("หลายหัวดีกว่าหัวเดียว"))
29
+ ```
30
+
config.json ADDED
@@ -0,0 +1,2076 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "custom_pipelines": {
9
+ "upos": {
10
+ "impl": "ud.BellmanFordTokenClassificationPipeline",
11
+ "pt": "AutoModelForTokenClassification"
12
+ },
13
+ "universal-dependencies": {
14
+ "impl": "ud.UniversalDependenciesCausalPipeline",
15
+ "pt": "AutoModelForTokenClassification"
16
+ }
17
+ },
18
+ "eos_token_id": 128001,
19
+ "head_dim": 64,
20
+ "hidden_act": "silu",
21
+ "hidden_size": 2048,
22
+ "id2label": {
23
+ "0": "ADP",
24
+ "1": "ADP|Foreign=Yes",
25
+ "2": "ADP|Foreign=Yes|l-case",
26
+ "3": "ADP|NounType=Class",
27
+ "4": "ADP|NounType=Class|l-case",
28
+ "5": "ADP|Prefix=Yes",
29
+ "6": "ADP|Prefix=Yes|l-case",
30
+ "7": "ADP|Prefix=Yes|l-mark",
31
+ "8": "ADP|l-acl",
32
+ "9": "ADP|l-advcl",
33
+ "10": "ADP|l-advmod",
34
+ "11": "ADP|l-case",
35
+ "12": "ADP|l-cc",
36
+ "13": "ADP|l-dep",
37
+ "14": "ADP|l-fixed",
38
+ "15": "ADP|l-flat",
39
+ "16": "ADP|l-mark",
40
+ "17": "ADP|l-nmod",
41
+ "18": "ADP|l-nsubj",
42
+ "19": "ADP|l-obl",
43
+ "20": "ADP|l-orphan",
44
+ "21": "ADP|r-acl",
45
+ "22": "ADP|r-advmod",
46
+ "23": "ADP|r-case",
47
+ "24": "ADP|r-compound",
48
+ "25": "ADP|r-conj",
49
+ "26": "ADP|r-fixed",
50
+ "27": "ADP|r-flat",
51
+ "28": "ADP|r-obl",
52
+ "29": "ADP|r-orphan",
53
+ "30": "ADP|root",
54
+ "31": "ADV",
55
+ "32": "ADV|Foreign=Yes",
56
+ "33": "ADV|Foreign=Yes|l-advmod",
57
+ "34": "ADV|Foreign=Yes|r-advmod",
58
+ "35": "ADV|NumType=Mult",
59
+ "36": "ADV|NumType=Mult|r-advmod",
60
+ "37": "ADV|PartType=Adv",
61
+ "38": "ADV|PartType=Adv|l-advmod",
62
+ "39": "ADV|PartType=Adv|l-mark",
63
+ "40": "ADV|PartType=Adv|r-advmod",
64
+ "41": "ADV|PartType=Enp",
65
+ "42": "ADV|PartType=Enp|l-advmod",
66
+ "43": "ADV|PartType=Enp|r-advmod",
67
+ "44": "ADV|PartType=Int",
68
+ "45": "ADV|PartType=Int|r-advmod",
69
+ "46": "ADV|PartType=Int|r-fixed",
70
+ "47": "ADV|Prefix=Yes",
71
+ "48": "ADV|Prefix=Yes|l-advmod",
72
+ "49": "ADV|Prefix=Yes|l-mark",
73
+ "50": "ADV|Prefix=Yes|r-advmod",
74
+ "51": "ADV|l-acl",
75
+ "52": "ADV|l-advcl",
76
+ "53": "ADV|l-advmod",
77
+ "54": "ADV|l-aux",
78
+ "55": "ADV|l-case",
79
+ "56": "ADV|l-compound",
80
+ "57": "ADV|l-dep",
81
+ "58": "ADV|l-det",
82
+ "59": "ADV|l-discourse",
83
+ "60": "ADV|l-fixed",
84
+ "61": "ADV|l-mark",
85
+ "62": "ADV|l-orphan",
86
+ "63": "ADV|r-acl",
87
+ "64": "ADV|r-advcl",
88
+ "65": "ADV|r-advmod",
89
+ "66": "ADV|r-aux",
90
+ "67": "ADV|r-compound",
91
+ "68": "ADV|r-conj",
92
+ "69": "ADV|r-det",
93
+ "70": "ADV|r-fixed",
94
+ "71": "ADV|r-flat",
95
+ "72": "ADV|r-mark",
96
+ "73": "ADV|r-nmod",
97
+ "74": "ADV|r-obj",
98
+ "75": "ADV|r-orphan",
99
+ "76": "ADV|r-xcomp",
100
+ "77": "ADV|root",
101
+ "78": "AUX",
102
+ "79": "AUX|Foreign=Yes",
103
+ "80": "AUX|Foreign=Yes|l-aux",
104
+ "81": "AUX|NounType=Class",
105
+ "82": "AUX|NounType=Class|r-appos",
106
+ "83": "AUX|Prefix=Yes",
107
+ "84": "AUX|Prefix=Yes|l-aux",
108
+ "85": "AUX|Prefix=Yes|r-aux",
109
+ "86": "AUX|VerbType=Cop",
110
+ "87": "AUX|VerbType=Cop|l-acl",
111
+ "88": "AUX|VerbType=Cop|l-advcl",
112
+ "89": "AUX|VerbType=Cop|l-aux",
113
+ "90": "AUX|VerbType=Cop|l-cop",
114
+ "91": "AUX|VerbType=Cop|r-acl",
115
+ "92": "AUX|VerbType=Cop|r-advcl",
116
+ "93": "AUX|VerbType=Cop|r-aux",
117
+ "94": "AUX|VerbType=Cop|r-conj",
118
+ "95": "AUX|VerbType=Cop|r-mark",
119
+ "96": "AUX|VerbType=Cop|root",
120
+ "97": "AUX|l-advmod",
121
+ "98": "AUX|l-aux",
122
+ "99": "AUX|l-cop",
123
+ "100": "AUX|l-mark",
124
+ "101": "AUX|r-acl",
125
+ "102": "AUX|r-advmod",
126
+ "103": "AUX|r-aux",
127
+ "104": "AUX|r-ccomp",
128
+ "105": "AUX|r-clf",
129
+ "106": "AUX|r-compound",
130
+ "107": "AUX|r-conj",
131
+ "108": "AUX|r-fixed",
132
+ "109": "AUX|root",
133
+ "110": "B-ADP",
134
+ "111": "B-ADP|Foreign=Yes",
135
+ "112": "B-ADP|NounType=Class",
136
+ "113": "B-ADP|Prefix=Yes",
137
+ "114": "B-ADV",
138
+ "115": "B-ADV|Foreign=Yes",
139
+ "116": "B-ADV|NumType=Mult",
140
+ "117": "B-ADV|PartType=Adv",
141
+ "118": "B-ADV|PartType=Enp",
142
+ "119": "B-ADV|PartType=Int",
143
+ "120": "B-ADV|Prefix=Yes",
144
+ "121": "B-AUX",
145
+ "122": "B-AUX|Foreign=Yes",
146
+ "123": "B-AUX|NounType=Class",
147
+ "124": "B-AUX|Prefix=Yes",
148
+ "125": "B-AUX|VerbType=Cop",
149
+ "126": "B-CCONJ",
150
+ "127": "B-CCONJ|Foreign=Yes",
151
+ "128": "B-CCONJ|PronType=Prs",
152
+ "129": "B-DET",
153
+ "130": "B-DET|NumType=Mult",
154
+ "131": "B-DET|PartType=Emp",
155
+ "132": "B-DET|PartType=Int",
156
+ "133": "B-NOUN",
157
+ "134": "B-NOUN|Abbr=Yes",
158
+ "135": "B-NOUN|Abbr=Yes|Foreign=Yes",
159
+ "136": "B-NOUN|Abbr=Yes|Prefix=Yes",
160
+ "137": "B-NOUN|Foreign=Yes",
161
+ "138": "B-NOUN|Foreign=Yes|NounType=Class",
162
+ "139": "B-NOUN|Foreign=Yes|Prefix=Yes",
163
+ "140": "B-NOUN|NameType=Com",
164
+ "141": "B-NOUN|NameType=Geo",
165
+ "142": "B-NOUN|NameType=Nat",
166
+ "143": "B-NOUN|NameType=Oth",
167
+ "144": "B-NOUN|NameType=Pro",
168
+ "145": "B-NOUN|NameType=Prs",
169
+ "146": "B-NOUN|NounType=Class",
170
+ "147": "B-NOUN|NounType=Class|Prefix=Yes",
171
+ "148": "B-NOUN|NumType=Mult",
172
+ "149": "B-NOUN|PartType=Enp",
173
+ "150": "B-NOUN|PartType=Int",
174
+ "151": "B-NOUN|PartType=Res",
175
+ "152": "B-NOUN|Prefix=Yes",
176
+ "153": "B-NUM",
177
+ "154": "B-NUM|Abbr=Yes",
178
+ "155": "B-NUM|Foreign=Yes",
179
+ "156": "B-NUM|NumType=Mult",
180
+ "157": "B-NUM|Prefix=Yes",
181
+ "158": "B-PART",
182
+ "159": "B-PART|NameType=Oth",
183
+ "160": "B-PART|NounType=Class|PartType=Emp",
184
+ "161": "B-PART|NounType=Class|PartType=Emp|Prefix=Yes",
185
+ "162": "B-PART|NounType=Class|Prefix=Yes",
186
+ "163": "B-PART|NumType=Mult|PartType=Emp",
187
+ "164": "B-PART|PartType=Adj",
188
+ "165": "B-PART|PartType=Adv",
189
+ "166": "B-PART|PartType=Emp",
190
+ "167": "B-PART|PartType=Emp|Prefix=Yes",
191
+ "168": "B-PART|PartType=Enp",
192
+ "169": "B-PART|PartType=Int",
193
+ "170": "B-PART|PartType=Neg",
194
+ "171": "B-PART|PartType=Res",
195
+ "172": "B-PART|Prefix=Yes",
196
+ "173": "B-PRON",
197
+ "174": "B-PRON|NounType=Class",
198
+ "175": "B-PRON|PronType=Prs",
199
+ "176": "B-PRON|PronType=Rcp",
200
+ "177": "B-PROPN",
201
+ "178": "B-PROPN|Abbr=Yes",
202
+ "179": "B-PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth",
203
+ "180": "B-PROPN|Abbr=Yes|NameType=Com",
204
+ "181": "B-PROPN|Foreign=Yes",
205
+ "182": "B-PROPN|Foreign=Yes|NameType=Com",
206
+ "183": "B-PROPN|Foreign=Yes|NameType=Geo",
207
+ "184": "B-PROPN|Foreign=Yes|NameType=Giv",
208
+ "185": "B-PROPN|Foreign=Yes|NameType=Oth",
209
+ "186": "B-PROPN|Foreign=Yes|NameType=Prs",
210
+ "187": "B-PROPN|Foreign=Yes|NameType=Sur",
211
+ "188": "B-PROPN|NameType=Com",
212
+ "189": "B-PROPN|NameType=Geo",
213
+ "190": "B-PROPN|NameType=Giv",
214
+ "191": "B-PROPN|NameType=Nat",
215
+ "192": "B-PROPN|NameType=Oth",
216
+ "193": "B-PROPN|NameType=Pro",
217
+ "194": "B-PROPN|NameType=Prs",
218
+ "195": "B-PROPN|NameType=Sur",
219
+ "196": "B-PROPN|NounType=Class",
220
+ "197": "B-PROPN|Prefix=Yes",
221
+ "198": "B-PUNCT",
222
+ "199": "B-PUNCT|NounType=Class",
223
+ "200": "B-SCONJ",
224
+ "201": "B-SCONJ|NumType=Mult",
225
+ "202": "B-SCONJ|Prefix=Yes",
226
+ "203": "B-SCONJ|VerbType=Cop",
227
+ "204": "B-SYM",
228
+ "205": "B-VERB",
229
+ "206": "B-VERB|Abbr=Yes",
230
+ "207": "B-VERB|Foreign=Yes",
231
+ "208": "B-VERB|NounType=Class",
232
+ "209": "B-VERB|PartType=Adj",
233
+ "210": "B-VERB|Prefix=Yes",
234
+ "211": "B-VERB|VerbType=Cop",
235
+ "212": "CCONJ",
236
+ "213": "CCONJ|Foreign=Yes",
237
+ "214": "CCONJ|Foreign=Yes|l-cc",
238
+ "215": "CCONJ|PronType=Prs",
239
+ "216": "CCONJ|PronType=Prs|l-cc",
240
+ "217": "CCONJ|l-advmod",
241
+ "218": "CCONJ|l-case",
242
+ "219": "CCONJ|l-cc",
243
+ "220": "CCONJ|l-conj",
244
+ "221": "CCONJ|l-discourse",
245
+ "222": "CCONJ|l-fixed",
246
+ "223": "CCONJ|l-flat",
247
+ "224": "CCONJ|l-mark",
248
+ "225": "CCONJ|l-nsubj",
249
+ "226": "CCONJ|l-obj",
250
+ "227": "CCONJ|l-orphan",
251
+ "228": "CCONJ|r-cc",
252
+ "229": "CCONJ|r-compound",
253
+ "230": "CCONJ|r-fixed",
254
+ "231": "CCONJ|r-mark",
255
+ "232": "DET",
256
+ "233": "DET|NumType=Mult",
257
+ "234": "DET|NumType=Mult|l-det",
258
+ "235": "DET|PartType=Emp",
259
+ "236": "DET|PartType=Emp|r-det",
260
+ "237": "DET|PartType=Int",
261
+ "238": "DET|PartType=Int|r-det",
262
+ "239": "DET|l-compound",
263
+ "240": "DET|l-det",
264
+ "241": "DET|l-discourse",
265
+ "242": "DET|l-nsubj",
266
+ "243": "DET|l-obl",
267
+ "244": "DET|l-orphan",
268
+ "245": "DET|r-advmod",
269
+ "246": "DET|r-compound",
270
+ "247": "DET|r-conj",
271
+ "248": "DET|r-dep",
272
+ "249": "DET|r-det",
273
+ "250": "DET|r-fixed",
274
+ "251": "DET|r-flat",
275
+ "252": "DET|r-list",
276
+ "253": "DET|r-nmod",
277
+ "254": "DET|r-nummod",
278
+ "255": "DET|r-obl",
279
+ "256": "DET|r-orphan",
280
+ "257": "I-ADP",
281
+ "258": "I-ADP|Foreign=Yes",
282
+ "259": "I-ADP|NounType=Class",
283
+ "260": "I-ADP|Prefix=Yes",
284
+ "261": "I-ADV",
285
+ "262": "I-ADV|Foreign=Yes",
286
+ "263": "I-ADV|NumType=Mult",
287
+ "264": "I-ADV|PartType=Adv",
288
+ "265": "I-ADV|PartType=Enp",
289
+ "266": "I-ADV|PartType=Int",
290
+ "267": "I-ADV|Prefix=Yes",
291
+ "268": "I-AUX",
292
+ "269": "I-AUX|Foreign=Yes",
293
+ "270": "I-AUX|NounType=Class",
294
+ "271": "I-AUX|Prefix=Yes",
295
+ "272": "I-AUX|VerbType=Cop",
296
+ "273": "I-CCONJ",
297
+ "274": "I-CCONJ|Foreign=Yes",
298
+ "275": "I-CCONJ|PronType=Prs",
299
+ "276": "I-DET",
300
+ "277": "I-DET|NumType=Mult",
301
+ "278": "I-DET|PartType=Emp",
302
+ "279": "I-DET|PartType=Int",
303
+ "280": "I-NOUN",
304
+ "281": "I-NOUN|Abbr=Yes",
305
+ "282": "I-NOUN|Abbr=Yes|Foreign=Yes",
306
+ "283": "I-NOUN|Abbr=Yes|Prefix=Yes",
307
+ "284": "I-NOUN|Foreign=Yes",
308
+ "285": "I-NOUN|Foreign=Yes|NounType=Class",
309
+ "286": "I-NOUN|Foreign=Yes|Prefix=Yes",
310
+ "287": "I-NOUN|NameType=Com",
311
+ "288": "I-NOUN|NameType=Geo",
312
+ "289": "I-NOUN|NameType=Nat",
313
+ "290": "I-NOUN|NameType=Oth",
314
+ "291": "I-NOUN|NameType=Pro",
315
+ "292": "I-NOUN|NameType=Prs",
316
+ "293": "I-NOUN|NounType=Class",
317
+ "294": "I-NOUN|NounType=Class|Prefix=Yes",
318
+ "295": "I-NOUN|NumType=Mult",
319
+ "296": "I-NOUN|PartType=Enp",
320
+ "297": "I-NOUN|PartType=Int",
321
+ "298": "I-NOUN|PartType=Res",
322
+ "299": "I-NOUN|Prefix=Yes",
323
+ "300": "I-NUM",
324
+ "301": "I-NUM|Abbr=Yes",
325
+ "302": "I-NUM|Foreign=Yes",
326
+ "303": "I-NUM|NumType=Mult",
327
+ "304": "I-NUM|Prefix=Yes",
328
+ "305": "I-PART",
329
+ "306": "I-PART|NameType=Oth",
330
+ "307": "I-PART|NounType=Class|PartType=Emp",
331
+ "308": "I-PART|NounType=Class|PartType=Emp|Prefix=Yes",
332
+ "309": "I-PART|NounType=Class|Prefix=Yes",
333
+ "310": "I-PART|NumType=Mult|PartType=Emp",
334
+ "311": "I-PART|PartType=Adj",
335
+ "312": "I-PART|PartType=Adv",
336
+ "313": "I-PART|PartType=Emp",
337
+ "314": "I-PART|PartType=Emp|Prefix=Yes",
338
+ "315": "I-PART|PartType=Enp",
339
+ "316": "I-PART|PartType=Int",
340
+ "317": "I-PART|PartType=Neg",
341
+ "318": "I-PART|PartType=Res",
342
+ "319": "I-PART|Prefix=Yes",
343
+ "320": "I-PRON",
344
+ "321": "I-PRON|NounType=Class",
345
+ "322": "I-PRON|PronType=Prs",
346
+ "323": "I-PRON|PronType=Rcp",
347
+ "324": "I-PROPN",
348
+ "325": "I-PROPN|Abbr=Yes",
349
+ "326": "I-PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth",
350
+ "327": "I-PROPN|Abbr=Yes|NameType=Com",
351
+ "328": "I-PROPN|Foreign=Yes",
352
+ "329": "I-PROPN|Foreign=Yes|NameType=Com",
353
+ "330": "I-PROPN|Foreign=Yes|NameType=Geo",
354
+ "331": "I-PROPN|Foreign=Yes|NameType=Giv",
355
+ "332": "I-PROPN|Foreign=Yes|NameType=Oth",
356
+ "333": "I-PROPN|Foreign=Yes|NameType=Prs",
357
+ "334": "I-PROPN|Foreign=Yes|NameType=Sur",
358
+ "335": "I-PROPN|NameType=Com",
359
+ "336": "I-PROPN|NameType=Geo",
360
+ "337": "I-PROPN|NameType=Giv",
361
+ "338": "I-PROPN|NameType=Nat",
362
+ "339": "I-PROPN|NameType=Oth",
363
+ "340": "I-PROPN|NameType=Pro",
364
+ "341": "I-PROPN|NameType=Prs",
365
+ "342": "I-PROPN|NameType=Sur",
366
+ "343": "I-PROPN|NounType=Class",
367
+ "344": "I-PROPN|Prefix=Yes",
368
+ "345": "I-PUNCT",
369
+ "346": "I-PUNCT|NounType=Class",
370
+ "347": "I-SCONJ",
371
+ "348": "I-SCONJ|NumType=Mult",
372
+ "349": "I-SCONJ|Prefix=Yes",
373
+ "350": "I-SCONJ|VerbType=Cop",
374
+ "351": "I-SYM",
375
+ "352": "I-VERB",
376
+ "353": "I-VERB|Abbr=Yes",
377
+ "354": "I-VERB|Foreign=Yes",
378
+ "355": "I-VERB|NounType=Class",
379
+ "356": "I-VERB|PartType=Adj",
380
+ "357": "I-VERB|Prefix=Yes",
381
+ "358": "I-VERB|VerbType=Cop",
382
+ "359": "NOUN",
383
+ "360": "NOUN|Abbr=Yes",
384
+ "361": "NOUN|Abbr=Yes|Foreign=Yes",
385
+ "362": "NOUN|Abbr=Yes|Foreign=Yes|r-nmod",
386
+ "363": "NOUN|Abbr=Yes|Prefix=Yes",
387
+ "364": "NOUN|Abbr=Yes|Prefix=Yes|l-flat",
388
+ "365": "NOUN|Abbr=Yes|l-flat",
389
+ "366": "NOUN|Abbr=Yes|l-nmod",
390
+ "367": "NOUN|Abbr=Yes|l-nsubj",
391
+ "368": "NOUN|Abbr=Yes|l-obl",
392
+ "369": "NOUN|Abbr=Yes|r-acl",
393
+ "370": "NOUN|Abbr=Yes|r-appos",
394
+ "371": "NOUN|Abbr=Yes|r-clf",
395
+ "372": "NOUN|Abbr=Yes|r-conj",
396
+ "373": "NOUN|Abbr=Yes|r-fixed",
397
+ "374": "NOUN|Abbr=Yes|r-flat",
398
+ "375": "NOUN|Abbr=Yes|r-nmod",
399
+ "376": "NOUN|Abbr=Yes|r-obj",
400
+ "377": "NOUN|Abbr=Yes|r-obl",
401
+ "378": "NOUN|Foreign=Yes",
402
+ "379": "NOUN|Foreign=Yes|NounType=Class",
403
+ "380": "NOUN|Foreign=Yes|NounType=Class|r-clf",
404
+ "381": "NOUN|Foreign=Yes|NounType=Class|r-obj",
405
+ "382": "NOUN|Foreign=Yes|Prefix=Yes",
406
+ "383": "NOUN|Foreign=Yes|Prefix=Yes|l-flat",
407
+ "384": "NOUN|Foreign=Yes|Prefix=Yes|r-appos",
408
+ "385": "NOUN|Foreign=Yes|l-dislocated",
409
+ "386": "NOUN|Foreign=Yes|l-flat",
410
+ "387": "NOUN|Foreign=Yes|l-nmod",
411
+ "388": "NOUN|Foreign=Yes|l-nsubj",
412
+ "389": "NOUN|Foreign=Yes|l-obl",
413
+ "390": "NOUN|Foreign=Yes|r-acl",
414
+ "391": "NOUN|Foreign=Yes|r-advcl",
415
+ "392": "NOUN|Foreign=Yes|r-advmod",
416
+ "393": "NOUN|Foreign=Yes|r-appos",
417
+ "394": "NOUN|Foreign=Yes|r-ccomp",
418
+ "395": "NOUN|Foreign=Yes|r-clf",
419
+ "396": "NOUN|Foreign=Yes|r-compound",
420
+ "397": "NOUN|Foreign=Yes|r-conj",
421
+ "398": "NOUN|Foreign=Yes|r-flat",
422
+ "399": "NOUN|Foreign=Yes|r-iobj",
423
+ "400": "NOUN|Foreign=Yes|r-list",
424
+ "401": "NOUN|Foreign=Yes|r-nmod",
425
+ "402": "NOUN|Foreign=Yes|r-obj",
426
+ "403": "NOUN|Foreign=Yes|r-obl",
427
+ "404": "NOUN|Foreign=Yes|r-xcomp",
428
+ "405": "NOUN|Foreign=Yes|root",
429
+ "406": "NOUN|NameType=Com",
430
+ "407": "NOUN|NameType=Com|r-nmod",
431
+ "408": "NOUN|NameType=Geo",
432
+ "409": "NOUN|NameType=Geo|l-nsubj",
433
+ "410": "NOUN|NameType=Geo|r-nmod",
434
+ "411": "NOUN|NameType=Geo|r-obj",
435
+ "412": "NOUN|NameType=Nat",
436
+ "413": "NOUN|NameType=Nat|r-nmod",
437
+ "414": "NOUN|NameType=Oth",
438
+ "415": "NOUN|NameType=Oth|l-nsubj",
439
+ "416": "NOUN|NameType=Oth|r-conj",
440
+ "417": "NOUN|NameType=Oth|r-flat",
441
+ "418": "NOUN|NameType=Oth|r-nmod",
442
+ "419": "NOUN|NameType=Pro",
443
+ "420": "NOUN|NameType=Pro|r-nmod",
444
+ "421": "NOUN|NameType=Prs",
445
+ "422": "NOUN|NameType=Prs|l-nsubj",
446
+ "423": "NOUN|NameType=Prs|r-nmod",
447
+ "424": "NOUN|NounType=Class",
448
+ "425": "NOUN|NounType=Class|Prefix=Yes",
449
+ "426": "NOUN|NounType=Class|Prefix=Yes|l-advcl",
450
+ "427": "NOUN|NounType=Class|Prefix=Yes|l-advmod",
451
+ "428": "NOUN|NounType=Class|Prefix=Yes|l-mark",
452
+ "429": "NOUN|NounType=Class|Prefix=Yes|l-nmod",
453
+ "430": "NOUN|NounType=Class|Prefix=Yes|l-nsubj",
454
+ "431": "NOUN|NounType=Class|Prefix=Yes|r-advcl",
455
+ "432": "NOUN|NounType=Class|Prefix=Yes|r-clf",
456
+ "433": "NOUN|NounType=Class|Prefix=Yes|r-nmod",
457
+ "434": "NOUN|NounType=Class|Prefix=Yes|r-obj",
458
+ "435": "NOUN|NounType=Class|l-advcl",
459
+ "436": "NOUN|NounType=Class|l-advmod",
460
+ "437": "NOUN|NounType=Class|l-clf",
461
+ "438": "NOUN|NounType=Class|l-dislocated",
462
+ "439": "NOUN|NounType=Class|l-nmod",
463
+ "440": "NOUN|NounType=Class|l-nsubj",
464
+ "441": "NOUN|NounType=Class|l-obj",
465
+ "442": "NOUN|NounType=Class|l-obl",
466
+ "443": "NOUN|NounType=Class|r-acl",
467
+ "444": "NOUN|NounType=Class|r-advcl",
468
+ "445": "NOUN|NounType=Class|r-advmod",
469
+ "446": "NOUN|NounType=Class|r-appos",
470
+ "447": "NOUN|NounType=Class|r-cc",
471
+ "448": "NOUN|NounType=Class|r-ccomp",
472
+ "449": "NOUN|NounType=Class|r-clf",
473
+ "450": "NOUN|NounType=Class|r-compound",
474
+ "451": "NOUN|NounType=Class|r-conj",
475
+ "452": "NOUN|NounType=Class|r-dislocated",
476
+ "453": "NOUN|NounType=Class|r-fixed",
477
+ "454": "NOUN|NounType=Class|r-flat",
478
+ "455": "NOUN|NounType=Class|r-iobj",
479
+ "456": "NOUN|NounType=Class|r-list",
480
+ "457": "NOUN|NounType=Class|r-nmod",
481
+ "458": "NOUN|NounType=Class|r-nummod",
482
+ "459": "NOUN|NounType=Class|r-obj",
483
+ "460": "NOUN|NounType=Class|r-obl",
484
+ "461": "NOUN|NounType=Class|r-orphan",
485
+ "462": "NOUN|NounType=Class|r-xcomp",
486
+ "463": "NOUN|NounType=Class|root",
487
+ "464": "NOUN|NumType=Mult",
488
+ "465": "NOUN|NumType=Mult|r-advcl",
489
+ "466": "NOUN|NumType=Mult|r-nmod",
490
+ "467": "NOUN|NumType=Mult|r-obj",
491
+ "468": "NOUN|PartType=Enp",
492
+ "469": "NOUN|PartType=Enp|r-obj",
493
+ "470": "NOUN|PartType=Enp|r-obl",
494
+ "471": "NOUN|PartType=Int",
495
+ "472": "NOUN|PartType=Int|r-obj",
496
+ "473": "NOUN|PartType=Res",
497
+ "474": "NOUN|PartType=Res|r-nmod",
498
+ "475": "NOUN|PartType=Res|r-obj",
499
+ "476": "NOUN|Prefix=Yes",
500
+ "477": "NOUN|Prefix=Yes|l-acl",
501
+ "478": "NOUN|Prefix=Yes|l-advcl",
502
+ "479": "NOUN|Prefix=Yes|l-clf",
503
+ "480": "NOUN|Prefix=Yes|l-csubj",
504
+ "481": "NOUN|Prefix=Yes|l-dislocated",
505
+ "482": "NOUN|Prefix=Yes|l-flat",
506
+ "483": "NOUN|Prefix=Yes|l-nmod",
507
+ "484": "NOUN|Prefix=Yes|l-nsubj",
508
+ "485": "NOUN|Prefix=Yes|l-obj",
509
+ "486": "NOUN|Prefix=Yes|l-obl",
510
+ "487": "NOUN|Prefix=Yes|r-acl",
511
+ "488": "NOUN|Prefix=Yes|r-advcl",
512
+ "489": "NOUN|Prefix=Yes|r-advmod",
513
+ "490": "NOUN|Prefix=Yes|r-appos",
514
+ "491": "NOUN|Prefix=Yes|r-case",
515
+ "492": "NOUN|Prefix=Yes|r-cc",
516
+ "493": "NOUN|Prefix=Yes|r-ccomp",
517
+ "494": "NOUN|Prefix=Yes|r-clf",
518
+ "495": "NOUN|Prefix=Yes|r-compound",
519
+ "496": "NOUN|Prefix=Yes|r-conj",
520
+ "497": "NOUN|Prefix=Yes|r-dislocated",
521
+ "498": "NOUN|Prefix=Yes|r-fixed",
522
+ "499": "NOUN|Prefix=Yes|r-flat",
523
+ "500": "NOUN|Prefix=Yes|r-iobj",
524
+ "501": "NOUN|Prefix=Yes|r-list",
525
+ "502": "NOUN|Prefix=Yes|r-nmod",
526
+ "503": "NOUN|Prefix=Yes|r-nummod",
527
+ "504": "NOUN|Prefix=Yes|r-obj",
528
+ "505": "NOUN|Prefix=Yes|r-obl",
529
+ "506": "NOUN|Prefix=Yes|r-orphan",
530
+ "507": "NOUN|Prefix=Yes|r-xcomp",
531
+ "508": "NOUN|Prefix=Yes|root",
532
+ "509": "NOUN|l-acl",
533
+ "510": "NOUN|l-advcl",
534
+ "511": "NOUN|l-advmod",
535
+ "512": "NOUN|l-case",
536
+ "513": "NOUN|l-ccomp",
537
+ "514": "NOUN|l-compound",
538
+ "515": "NOUN|l-csubj",
539
+ "516": "NOUN|l-discourse",
540
+ "517": "NOUN|l-dislocated",
541
+ "518": "NOUN|l-expl",
542
+ "519": "NOUN|l-flat",
543
+ "520": "NOUN|l-iobj",
544
+ "521": "NOUN|l-mark",
545
+ "522": "NOUN|l-nmod",
546
+ "523": "NOUN|l-nsubj",
547
+ "524": "NOUN|l-nummod",
548
+ "525": "NOUN|l-obj",
549
+ "526": "NOUN|l-obl",
550
+ "527": "NOUN|l-orphan",
551
+ "528": "NOUN|l-vocative",
552
+ "529": "NOUN|r-acl",
553
+ "530": "NOUN|r-advcl",
554
+ "531": "NOUN|r-advmod",
555
+ "532": "NOUN|r-appos",
556
+ "533": "NOUN|r-case",
557
+ "534": "NOUN|r-cc",
558
+ "535": "NOUN|r-ccomp",
559
+ "536": "NOUN|r-clf",
560
+ "537": "NOUN|r-compound",
561
+ "538": "NOUN|r-conj",
562
+ "539": "NOUN|r-cop",
563
+ "540": "NOUN|r-discourse",
564
+ "541": "NOUN|r-dislocated",
565
+ "542": "NOUN|r-fixed",
566
+ "543": "NOUN|r-flat",
567
+ "544": "NOUN|r-iobj",
568
+ "545": "NOUN|r-list",
569
+ "546": "NOUN|r-mark",
570
+ "547": "NOUN|r-nmod",
571
+ "548": "NOUN|r-nsubj",
572
+ "549": "NOUN|r-nummod",
573
+ "550": "NOUN|r-obj",
574
+ "551": "NOUN|r-obl",
575
+ "552": "NOUN|r-orphan",
576
+ "553": "NOUN|r-parataxis",
577
+ "554": "NOUN|r-xcomp",
578
+ "555": "NOUN|root",
579
+ "556": "NUM",
580
+ "557": "NUM|Abbr=Yes",
581
+ "558": "NUM|Abbr=Yes|r-flat",
582
+ "559": "NUM|Abbr=Yes|r-nummod",
583
+ "560": "NUM|Abbr=Yes|r-obj",
584
+ "561": "NUM|Foreign=Yes",
585
+ "562": "NUM|Foreign=Yes|r-clf",
586
+ "563": "NUM|NumType=Mult",
587
+ "564": "NUM|NumType=Mult|l-advmod",
588
+ "565": "NUM|NumType=Mult|l-nummod",
589
+ "566": "NUM|NumType=Mult|r-advmod",
590
+ "567": "NUM|Prefix=Yes",
591
+ "568": "NUM|Prefix=Yes|l-nummod",
592
+ "569": "NUM|l-advcl",
593
+ "570": "NUM|l-advmod",
594
+ "571": "NUM|l-case",
595
+ "572": "NUM|l-clf",
596
+ "573": "NUM|l-dep",
597
+ "574": "NUM|l-flat",
598
+ "575": "NUM|l-nmod",
599
+ "576": "NUM|l-nsubj",
600
+ "577": "NUM|l-nummod",
601
+ "578": "NUM|l-obl",
602
+ "579": "NUM|r-acl",
603
+ "580": "NUM|r-advmod",
604
+ "581": "NUM|r-appos",
605
+ "582": "NUM|r-ccomp",
606
+ "583": "NUM|r-compound",
607
+ "584": "NUM|r-conj",
608
+ "585": "NUM|r-det",
609
+ "586": "NUM|r-fixed",
610
+ "587": "NUM|r-flat",
611
+ "588": "NUM|r-iobj",
612
+ "589": "NUM|r-nmod",
613
+ "590": "NUM|r-nummod",
614
+ "591": "NUM|r-obj",
615
+ "592": "NUM|r-obl",
616
+ "593": "NUM|root",
617
+ "594": "PART",
618
+ "595": "PART|NameType=Oth",
619
+ "596": "PART|NameType=Oth|l-advmod",
620
+ "597": "PART|NounType=Class|PartType=Emp",
621
+ "598": "PART|NounType=Class|PartType=Emp|Prefix=Yes",
622
+ "599": "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark",
623
+ "600": "PART|NounType=Class|PartType=Emp|l-mark",
624
+ "601": "PART|NounType=Class|Prefix=Yes",
625
+ "602": "PART|NounType=Class|Prefix=Yes|l-mark",
626
+ "603": "PART|NumType=Mult|PartType=Emp",
627
+ "604": "PART|NumType=Mult|PartType=Emp|l-mark",
628
+ "605": "PART|PartType=Adj",
629
+ "606": "PART|PartType=Adj|l-mark",
630
+ "607": "PART|PartType=Adj|l-orphan",
631
+ "608": "PART|PartType=Adj|r-acl",
632
+ "609": "PART|PartType=Adj|r-compound",
633
+ "610": "PART|PartType=Adj|r-nmod",
634
+ "611": "PART|PartType=Adv",
635
+ "612": "PART|PartType=Adv|l-advmod",
636
+ "613": "PART|PartType=Adv|l-mark",
637
+ "614": "PART|PartType=Adv|r-advmod",
638
+ "615": "PART|PartType=Emp",
639
+ "616": "PART|PartType=Emp|Prefix=Yes",
640
+ "617": "PART|PartType=Emp|Prefix=Yes|l-advmod",
641
+ "618": "PART|PartType=Emp|Prefix=Yes|l-aux",
642
+ "619": "PART|PartType=Emp|Prefix=Yes|l-mark",
643
+ "620": "PART|PartType=Emp|l-advmod",
644
+ "621": "PART|PartType=Emp|l-case",
645
+ "622": "PART|PartType=Emp|l-discourse",
646
+ "623": "PART|PartType=Emp|l-mark",
647
+ "624": "PART|PartType=Emp|r-acl",
648
+ "625": "PART|PartType=Emp|r-advmod",
649
+ "626": "PART|PartType=Emp|r-aux",
650
+ "627": "PART|PartType=Emp|r-compound",
651
+ "628": "PART|PartType=Emp|r-det",
652
+ "629": "PART|PartType=Emp|r-fixed",
653
+ "630": "PART|PartType=Emp|r-mark",
654
+ "631": "PART|PartType=Emp|r-nmod",
655
+ "632": "PART|PartType=Enp",
656
+ "633": "PART|PartType=Enp|l-discourse",
657
+ "634": "PART|PartType=Enp|r-acl",
658
+ "635": "PART|PartType=Enp|r-advmod",
659
+ "636": "PART|PartType=Enp|r-compound",
660
+ "637": "PART|PartType=Enp|r-dep",
661
+ "638": "PART|PartType=Enp|r-det",
662
+ "639": "PART|PartType=Enp|r-discourse",
663
+ "640": "PART|PartType=Enp|r-fixed",
664
+ "641": "PART|PartType=Enp|r-obl",
665
+ "642": "PART|PartType=Int",
666
+ "643": "PART|PartType=Int|l-advmod",
667
+ "644": "PART|PartType=Int|l-mark",
668
+ "645": "PART|PartType=Int|r-acl",
669
+ "646": "PART|PartType=Int|r-advmod",
670
+ "647": "PART|PartType=Int|r-dep",
671
+ "648": "PART|PartType=Int|r-discourse",
672
+ "649": "PART|PartType=Int|r-nmod",
673
+ "650": "PART|PartType=Int|r-obj",
674
+ "651": "PART|PartType=Int|r-obl",
675
+ "652": "PART|PartType=Neg",
676
+ "653": "PART|PartType=Neg|l-advcl",
677
+ "654": "PART|PartType=Neg|l-advmod",
678
+ "655": "PART|PartType=Neg|l-aux",
679
+ "656": "PART|PartType=Neg|l-mark",
680
+ "657": "PART|PartType=Neg|r-acl",
681
+ "658": "PART|PartType=Neg|r-advmod",
682
+ "659": "PART|PartType=Neg|r-fixed",
683
+ "660": "PART|PartType=Res",
684
+ "661": "PART|PartType=Res|r-advmod",
685
+ "662": "PART|PartType=Res|r-discourse",
686
+ "663": "PART|PartType=Res|r-fixed",
687
+ "664": "PART|Prefix=Yes",
688
+ "665": "PART|Prefix=Yes|l-advmod",
689
+ "666": "PART|Prefix=Yes|l-aux",
690
+ "667": "PART|Prefix=Yes|l-mark",
691
+ "668": "PART|Prefix=Yes|r-acl",
692
+ "669": "PART|Prefix=Yes|r-nmod",
693
+ "670": "PART|l-advmod",
694
+ "671": "PART|l-discourse",
695
+ "672": "PART|l-mark",
696
+ "673": "PART|l-nsubj",
697
+ "674": "PART|r-acl",
698
+ "675": "PART|r-advmod",
699
+ "676": "PART|r-discourse",
700
+ "677": "PART|r-fixed",
701
+ "678": "PART|r-mark",
702
+ "679": "PART|r-obj",
703
+ "680": "PRON",
704
+ "681": "PRON|NounType=Class",
705
+ "682": "PRON|NounType=Class|r-clf",
706
+ "683": "PRON|PronType=Prs",
707
+ "684": "PRON|PronType=Prs|l-advmod",
708
+ "685": "PRON|PronType=Prs|l-expl",
709
+ "686": "PRON|PronType=Prs|l-nsubj",
710
+ "687": "PRON|PronType=Prs|l-obj",
711
+ "688": "PRON|PronType=Prs|l-obl",
712
+ "689": "PRON|PronType=Prs|r-advcl",
713
+ "690": "PRON|PronType=Prs|r-advmod",
714
+ "691": "PRON|PronType=Prs|r-ccomp",
715
+ "692": "PRON|PronType=Prs|r-clf",
716
+ "693": "PRON|PronType=Prs|r-conj",
717
+ "694": "PRON|PronType=Prs|r-nmod",
718
+ "695": "PRON|PronType=Prs|r-nsubj",
719
+ "696": "PRON|PronType=Prs|r-obj",
720
+ "697": "PRON|PronType=Prs|r-obl",
721
+ "698": "PRON|PronType=Prs|root",
722
+ "699": "PRON|PronType=Rcp",
723
+ "700": "PRON|PronType=Rcp|r-advmod",
724
+ "701": "PRON|PronType=Rcp|r-iobj",
725
+ "702": "PRON|PronType=Rcp|r-nmod",
726
+ "703": "PRON|PronType=Rcp|r-obj",
727
+ "704": "PRON|PronType=Rcp|r-obl",
728
+ "705": "PRON|l-advcl",
729
+ "706": "PRON|l-advmod",
730
+ "707": "PRON|l-compound",
731
+ "708": "PRON|l-csubj",
732
+ "709": "PRON|l-dislocated",
733
+ "710": "PRON|l-expl",
734
+ "711": "PRON|l-iobj",
735
+ "712": "PRON|l-mark",
736
+ "713": "PRON|l-nsubj",
737
+ "714": "PRON|l-obj",
738
+ "715": "PRON|l-obl",
739
+ "716": "PRON|r-acl",
740
+ "717": "PRON|r-advmod",
741
+ "718": "PRON|r-appos",
742
+ "719": "PRON|r-ccomp",
743
+ "720": "PRON|r-compound",
744
+ "721": "PRON|r-conj",
745
+ "722": "PRON|r-det",
746
+ "723": "PRON|r-discourse",
747
+ "724": "PRON|r-fixed",
748
+ "725": "PRON|r-flat",
749
+ "726": "PRON|r-iobj",
750
+ "727": "PRON|r-nmod",
751
+ "728": "PRON|r-nsubj",
752
+ "729": "PRON|r-obj",
753
+ "730": "PRON|r-obl",
754
+ "731": "PROPN",
755
+ "732": "PROPN|Abbr=Yes",
756
+ "733": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth",
757
+ "734": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj",
758
+ "735": "PROPN|Abbr=Yes|NameType=Com",
759
+ "736": "PROPN|Abbr=Yes|NameType=Com|r-advmod",
760
+ "737": "PROPN|Abbr=Yes|NameType=Com|r-nmod",
761
+ "738": "PROPN|Abbr=Yes|l-nmod",
762
+ "739": "PROPN|Abbr=Yes|l-nsubj",
763
+ "740": "PROPN|Abbr=Yes|r-nmod",
764
+ "741": "PROPN|Foreign=Yes",
765
+ "742": "PROPN|Foreign=Yes|NameType=Com",
766
+ "743": "PROPN|Foreign=Yes|NameType=Com|l-nsubj",
767
+ "744": "PROPN|Foreign=Yes|NameType=Com|r-list",
768
+ "745": "PROPN|Foreign=Yes|NameType=Com|r-nmod",
769
+ "746": "PROPN|Foreign=Yes|NameType=Com|r-obl",
770
+ "747": "PROPN|Foreign=Yes|NameType=Geo",
771
+ "748": "PROPN|Foreign=Yes|NameType=Geo|r-obj",
772
+ "749": "PROPN|Foreign=Yes|NameType=Geo|r-obl",
773
+ "750": "PROPN|Foreign=Yes|NameType=Giv",
774
+ "751": "PROPN|Foreign=Yes|NameType=Giv|l-nsubj",
775
+ "752": "PROPN|Foreign=Yes|NameType=Oth",
776
+ "753": "PROPN|Foreign=Yes|NameType=Oth|r-conj",
777
+ "754": "PROPN|Foreign=Yes|NameType=Oth|r-flat",
778
+ "755": "PROPN|Foreign=Yes|NameType=Oth|r-nmod",
779
+ "756": "PROPN|Foreign=Yes|NameType=Prs",
780
+ "757": "PROPN|Foreign=Yes|NameType=Prs|l-flat",
781
+ "758": "PROPN|Foreign=Yes|NameType=Prs|l-nsubj",
782
+ "759": "PROPN|Foreign=Yes|NameType=Prs|r-conj",
783
+ "760": "PROPN|Foreign=Yes|NameType=Prs|r-flat",
784
+ "761": "PROPN|Foreign=Yes|NameType=Prs|r-nmod",
785
+ "762": "PROPN|Foreign=Yes|NameType=Prs|r-obj",
786
+ "763": "PROPN|Foreign=Yes|NameType=Prs|r-obl",
787
+ "764": "PROPN|Foreign=Yes|NameType=Sur",
788
+ "765": "PROPN|Foreign=Yes|NameType=Sur|r-flat",
789
+ "766": "PROPN|Foreign=Yes|l-flat",
790
+ "767": "PROPN|Foreign=Yes|l-nmod",
791
+ "768": "PROPN|Foreign=Yes|l-nsubj",
792
+ "769": "PROPN|Foreign=Yes|l-obl",
793
+ "770": "PROPN|Foreign=Yes|r-appos",
794
+ "771": "PROPN|Foreign=Yes|r-ccomp",
795
+ "772": "PROPN|Foreign=Yes|r-compound",
796
+ "773": "PROPN|Foreign=Yes|r-conj",
797
+ "774": "PROPN|Foreign=Yes|r-flat",
798
+ "775": "PROPN|Foreign=Yes|r-iobj",
799
+ "776": "PROPN|Foreign=Yes|r-list",
800
+ "777": "PROPN|Foreign=Yes|r-nmod",
801
+ "778": "PROPN|Foreign=Yes|r-nsubj",
802
+ "779": "PROPN|Foreign=Yes|r-obj",
803
+ "780": "PROPN|Foreign=Yes|r-obl",
804
+ "781": "PROPN|Foreign=Yes|root",
805
+ "782": "PROPN|NameType=Com",
806
+ "783": "PROPN|NameType=Com|l-nsubj",
807
+ "784": "PROPN|NameType=Com|l-obl",
808
+ "785": "PROPN|NameType=Com|r-appos",
809
+ "786": "PROPN|NameType=Com|r-conj",
810
+ "787": "PROPN|NameType=Com|r-flat",
811
+ "788": "PROPN|NameType=Com|r-list",
812
+ "789": "PROPN|NameType=Com|r-nmod",
813
+ "790": "PROPN|NameType=Com|r-nsubj",
814
+ "791": "PROPN|NameType=Com|r-obj",
815
+ "792": "PROPN|NameType=Com|r-obl",
816
+ "793": "PROPN|NameType=Geo",
817
+ "794": "PROPN|NameType=Geo|l-nsubj",
818
+ "795": "PROPN|NameType=Geo|l-obl",
819
+ "796": "PROPN|NameType=Geo|r-compound",
820
+ "797": "PROPN|NameType=Geo|r-conj",
821
+ "798": "PROPN|NameType=Geo|r-flat",
822
+ "799": "PROPN|NameType=Geo|r-list",
823
+ "800": "PROPN|NameType=Geo|r-nmod",
824
+ "801": "PROPN|NameType=Geo|r-nsubj",
825
+ "802": "PROPN|NameType=Geo|r-nummod",
826
+ "803": "PROPN|NameType=Geo|r-obj",
827
+ "804": "PROPN|NameType=Geo|r-obl",
828
+ "805": "PROPN|NameType=Geo|root",
829
+ "806": "PROPN|NameType=Giv",
830
+ "807": "PROPN|NameType=Giv|l-dislocated",
831
+ "808": "PROPN|NameType=Giv|l-nsubj",
832
+ "809": "PROPN|NameType=Giv|l-obl",
833
+ "810": "PROPN|NameType=Giv|r-acl",
834
+ "811": "PROPN|NameType=Giv|r-appos",
835
+ "812": "PROPN|NameType=Giv|r-ccomp",
836
+ "813": "PROPN|NameType=Giv|r-conj",
837
+ "814": "PROPN|NameType=Giv|r-flat",
838
+ "815": "PROPN|NameType=Giv|r-list",
839
+ "816": "PROPN|NameType=Giv|r-nmod",
840
+ "817": "PROPN|NameType=Giv|r-nsubj",
841
+ "818": "PROPN|NameType=Giv|r-obj",
842
+ "819": "PROPN|NameType=Giv|r-obl",
843
+ "820": "PROPN|NameType=Giv|root",
844
+ "821": "PROPN|NameType=Nat",
845
+ "822": "PROPN|NameType=Nat|l-csubj",
846
+ "823": "PROPN|NameType=Nat|l-nsubj",
847
+ "824": "PROPN|NameType=Nat|l-obl",
848
+ "825": "PROPN|NameType=Nat|r-acl",
849
+ "826": "PROPN|NameType=Nat|r-appos",
850
+ "827": "PROPN|NameType=Nat|r-compound",
851
+ "828": "PROPN|NameType=Nat|r-conj",
852
+ "829": "PROPN|NameType=Nat|r-flat",
853
+ "830": "PROPN|NameType=Nat|r-list",
854
+ "831": "PROPN|NameType=Nat|r-nmod",
855
+ "832": "PROPN|NameType=Nat|r-nummod",
856
+ "833": "PROPN|NameType=Nat|r-obj",
857
+ "834": "PROPN|NameType=Nat|r-obl",
858
+ "835": "PROPN|NameType=Oth",
859
+ "836": "PROPN|NameType=Oth|l-dislocated",
860
+ "837": "PROPN|NameType=Oth|l-nsubj",
861
+ "838": "PROPN|NameType=Oth|r-acl",
862
+ "839": "PROPN|NameType=Oth|r-appos",
863
+ "840": "PROPN|NameType=Oth|r-compound",
864
+ "841": "PROPN|NameType=Oth|r-conj",
865
+ "842": "PROPN|NameType=Oth|r-flat",
866
+ "843": "PROPN|NameType=Oth|r-nmod",
867
+ "844": "PROPN|NameType=Oth|r-obj",
868
+ "845": "PROPN|NameType=Oth|r-obl",
869
+ "846": "PROPN|NameType=Oth|root",
870
+ "847": "PROPN|NameType=Pro",
871
+ "848": "PROPN|NameType=Pro|l-nsubj",
872
+ "849": "PROPN|NameType=Pro|l-obl",
873
+ "850": "PROPN|NameType=Pro|r-advcl",
874
+ "851": "PROPN|NameType=Pro|r-flat",
875
+ "852": "PROPN|NameType=Pro|r-nmod",
876
+ "853": "PROPN|NameType=Pro|r-obj",
877
+ "854": "PROPN|NameType=Prs",
878
+ "855": "PROPN|NameType=Prs|l-dislocated",
879
+ "856": "PROPN|NameType=Prs|l-nsubj",
880
+ "857": "PROPN|NameType=Prs|l-obl",
881
+ "858": "PROPN|NameType=Prs|l-vocative",
882
+ "859": "PROPN|NameType=Prs|r-conj",
883
+ "860": "PROPN|NameType=Prs|r-discourse",
884
+ "861": "PROPN|NameType=Prs|r-flat",
885
+ "862": "PROPN|NameType=Prs|r-list",
886
+ "863": "PROPN|NameType=Prs|r-nmod",
887
+ "864": "PROPN|NameType=Prs|r-obj",
888
+ "865": "PROPN|NameType=Prs|r-obl",
889
+ "866": "PROPN|NameType=Prs|r-vocative",
890
+ "867": "PROPN|NameType=Sur",
891
+ "868": "PROPN|NameType=Sur|l-nsubj",
892
+ "869": "PROPN|NameType=Sur|r-flat",
893
+ "870": "PROPN|NameType=Sur|r-nmod",
894
+ "871": "PROPN|NounType=Class",
895
+ "872": "PROPN|NounType=Class|r-clf",
896
+ "873": "PROPN|Prefix=Yes",
897
+ "874": "PROPN|Prefix=Yes|l-nsubj",
898
+ "875": "PROPN|Prefix=Yes|r-nmod",
899
+ "876": "PROPN|l-advmod",
900
+ "877": "PROPN|l-nsubj",
901
+ "878": "PROPN|l-obl",
902
+ "879": "PROPN|r-acl",
903
+ "880": "PROPN|r-advmod",
904
+ "881": "PROPN|r-appos",
905
+ "882": "PROPN|r-clf",
906
+ "883": "PROPN|r-compound",
907
+ "884": "PROPN|r-conj",
908
+ "885": "PROPN|r-fixed",
909
+ "886": "PROPN|r-flat",
910
+ "887": "PROPN|r-iobj",
911
+ "888": "PROPN|r-list",
912
+ "889": "PROPN|r-nmod",
913
+ "890": "PROPN|r-obj",
914
+ "891": "PROPN|r-obl",
915
+ "892": "PROPN|root",
916
+ "893": "PUNCT",
917
+ "894": "PUNCT|NounType=Class",
918
+ "895": "PUNCT|NounType=Class|r-punct",
919
+ "896": "PUNCT|l-advmod",
920
+ "897": "PUNCT|l-dep",
921
+ "898": "PUNCT|l-punct",
922
+ "899": "PUNCT|r-dep",
923
+ "900": "PUNCT|r-punct",
924
+ "901": "SCONJ",
925
+ "902": "SCONJ|NumType=Mult",
926
+ "903": "SCONJ|NumType=Mult|l-mark",
927
+ "904": "SCONJ|Prefix=Yes",
928
+ "905": "SCONJ|Prefix=Yes|l-cc",
929
+ "906": "SCONJ|Prefix=Yes|l-mark",
930
+ "907": "SCONJ|VerbType=Cop",
931
+ "908": "SCONJ|VerbType=Cop|l-mark",
932
+ "909": "SCONJ|l-advmod",
933
+ "910": "SCONJ|l-case",
934
+ "911": "SCONJ|l-cc",
935
+ "912": "SCONJ|l-discourse",
936
+ "913": "SCONJ|l-mark",
937
+ "914": "SCONJ|l-nsubj",
938
+ "915": "SCONJ|l-orphan",
939
+ "916": "SCONJ|r-advcl",
940
+ "917": "SCONJ|r-compound",
941
+ "918": "SCONJ|r-fixed",
942
+ "919": "SCONJ|r-flat",
943
+ "920": "SCONJ|r-mark",
944
+ "921": "SCONJ|r-orphan",
945
+ "922": "SCONJ|root",
946
+ "923": "SYM",
947
+ "924": "SYM|l-dep",
948
+ "925": "SYM|r-clf",
949
+ "926": "SYM|r-nmod",
950
+ "927": "SYM|r-obj",
951
+ "928": "SYM|r-obl",
952
+ "929": "SYM|r-xcomp",
953
+ "930": "VERB",
954
+ "931": "VERB|Abbr=Yes",
955
+ "932": "VERB|Abbr=Yes|r-acl",
956
+ "933": "VERB|Foreign=Yes",
957
+ "934": "VERB|Foreign=Yes|l-nsubj",
958
+ "935": "VERB|Foreign=Yes|r-acl",
959
+ "936": "VERB|Foreign=Yes|r-advcl",
960
+ "937": "VERB|Foreign=Yes|r-ccomp",
961
+ "938": "VERB|Foreign=Yes|r-compound",
962
+ "939": "VERB|Foreign=Yes|r-conj",
963
+ "940": "VERB|Foreign=Yes|r-flat",
964
+ "941": "VERB|Foreign=Yes|r-nmod",
965
+ "942": "VERB|Foreign=Yes|r-xcomp",
966
+ "943": "VERB|Foreign=Yes|root",
967
+ "944": "VERB|NounType=Class",
968
+ "945": "VERB|NounType=Class|r-acl",
969
+ "946": "VERB|NounType=Class|r-compound",
970
+ "947": "VERB|PartType=Adj",
971
+ "948": "VERB|PartType=Adj|r-acl",
972
+ "949": "VERB|Prefix=Yes",
973
+ "950": "VERB|Prefix=Yes|l-acl",
974
+ "951": "VERB|Prefix=Yes|l-nsubj",
975
+ "952": "VERB|Prefix=Yes|r-acl",
976
+ "953": "VERB|Prefix=Yes|r-advcl",
977
+ "954": "VERB|Prefix=Yes|r-ccomp",
978
+ "955": "VERB|Prefix=Yes|r-compound",
979
+ "956": "VERB|Prefix=Yes|r-conj",
980
+ "957": "VERB|Prefix=Yes|r-parataxis",
981
+ "958": "VERB|Prefix=Yes|root",
982
+ "959": "VERB|VerbType=Cop",
983
+ "960": "VERB|VerbType=Cop|l-advmod",
984
+ "961": "VERB|VerbType=Cop|l-cop",
985
+ "962": "VERB|VerbType=Cop|r-acl",
986
+ "963": "VERB|VerbType=Cop|r-advcl",
987
+ "964": "VERB|VerbType=Cop|r-ccomp",
988
+ "965": "VERB|VerbType=Cop|r-compound",
989
+ "966": "VERB|VerbType=Cop|r-parataxis",
990
+ "967": "VERB|VerbType=Cop|root",
991
+ "968": "VERB|l-acl",
992
+ "969": "VERB|l-advcl",
993
+ "970": "VERB|l-advmod",
994
+ "971": "VERB|l-aux",
995
+ "972": "VERB|l-case",
996
+ "973": "VERB|l-cc",
997
+ "974": "VERB|l-ccomp",
998
+ "975": "VERB|l-compound",
999
+ "976": "VERB|l-conj",
1000
+ "977": "VERB|l-cop",
1001
+ "978": "VERB|l-csubj",
1002
+ "979": "VERB|l-discourse",
1003
+ "980": "VERB|l-dislocated",
1004
+ "981": "VERB|l-mark",
1005
+ "982": "VERB|l-nsubj",
1006
+ "983": "VERB|l-obl",
1007
+ "984": "VERB|l-orphan",
1008
+ "985": "VERB|l-xcomp",
1009
+ "986": "VERB|r-acl",
1010
+ "987": "VERB|r-advcl",
1011
+ "988": "VERB|r-advmod",
1012
+ "989": "VERB|r-appos",
1013
+ "990": "VERB|r-aux",
1014
+ "991": "VERB|r-case",
1015
+ "992": "VERB|r-cc",
1016
+ "993": "VERB|r-ccomp",
1017
+ "994": "VERB|r-clf",
1018
+ "995": "VERB|r-compound",
1019
+ "996": "VERB|r-conj",
1020
+ "997": "VERB|r-dep",
1021
+ "998": "VERB|r-det",
1022
+ "999": "VERB|r-discourse",
1023
+ "1000": "VERB|r-fixed",
1024
+ "1001": "VERB|r-flat",
1025
+ "1002": "VERB|r-list",
1026
+ "1003": "VERB|r-mark",
1027
+ "1004": "VERB|r-nmod",
1028
+ "1005": "VERB|r-nsubj",
1029
+ "1006": "VERB|r-obj",
1030
+ "1007": "VERB|r-obl",
1031
+ "1008": "VERB|r-orphan",
1032
+ "1009": "VERB|r-parataxis",
1033
+ "1010": "VERB|r-punct",
1034
+ "1011": "VERB|r-xcomp",
1035
+ "1012": "VERB|root"
1036
+ },
1037
+ "initializer_range": 0.02,
1038
+ "intermediate_size": 8192,
1039
+ "label2id": {
1040
+ "ADP": 0,
1041
+ "ADP|Foreign=Yes": 1,
1042
+ "ADP|Foreign=Yes|l-case": 2,
1043
+ "ADP|NounType=Class": 3,
1044
+ "ADP|NounType=Class|l-case": 4,
1045
+ "ADP|Prefix=Yes": 5,
1046
+ "ADP|Prefix=Yes|l-case": 6,
1047
+ "ADP|Prefix=Yes|l-mark": 7,
1048
+ "ADP|l-acl": 8,
1049
+ "ADP|l-advcl": 9,
1050
+ "ADP|l-advmod": 10,
1051
+ "ADP|l-case": 11,
1052
+ "ADP|l-cc": 12,
1053
+ "ADP|l-dep": 13,
1054
+ "ADP|l-fixed": 14,
1055
+ "ADP|l-flat": 15,
1056
+ "ADP|l-mark": 16,
1057
+ "ADP|l-nmod": 17,
1058
+ "ADP|l-nsubj": 18,
1059
+ "ADP|l-obl": 19,
1060
+ "ADP|l-orphan": 20,
1061
+ "ADP|r-acl": 21,
1062
+ "ADP|r-advmod": 22,
1063
+ "ADP|r-case": 23,
1064
+ "ADP|r-compound": 24,
1065
+ "ADP|r-conj": 25,
1066
+ "ADP|r-fixed": 26,
1067
+ "ADP|r-flat": 27,
1068
+ "ADP|r-obl": 28,
1069
+ "ADP|r-orphan": 29,
1070
+ "ADP|root": 30,
1071
+ "ADV": 31,
1072
+ "ADV|Foreign=Yes": 32,
1073
+ "ADV|Foreign=Yes|l-advmod": 33,
1074
+ "ADV|Foreign=Yes|r-advmod": 34,
1075
+ "ADV|NumType=Mult": 35,
1076
+ "ADV|NumType=Mult|r-advmod": 36,
1077
+ "ADV|PartType=Adv": 37,
1078
+ "ADV|PartType=Adv|l-advmod": 38,
1079
+ "ADV|PartType=Adv|l-mark": 39,
1080
+ "ADV|PartType=Adv|r-advmod": 40,
1081
+ "ADV|PartType=Enp": 41,
1082
+ "ADV|PartType=Enp|l-advmod": 42,
1083
+ "ADV|PartType=Enp|r-advmod": 43,
1084
+ "ADV|PartType=Int": 44,
1085
+ "ADV|PartType=Int|r-advmod": 45,
1086
+ "ADV|PartType=Int|r-fixed": 46,
1087
+ "ADV|Prefix=Yes": 47,
1088
+ "ADV|Prefix=Yes|l-advmod": 48,
1089
+ "ADV|Prefix=Yes|l-mark": 49,
1090
+ "ADV|Prefix=Yes|r-advmod": 50,
1091
+ "ADV|l-acl": 51,
1092
+ "ADV|l-advcl": 52,
1093
+ "ADV|l-advmod": 53,
1094
+ "ADV|l-aux": 54,
1095
+ "ADV|l-case": 55,
1096
+ "ADV|l-compound": 56,
1097
+ "ADV|l-dep": 57,
1098
+ "ADV|l-det": 58,
1099
+ "ADV|l-discourse": 59,
1100
+ "ADV|l-fixed": 60,
1101
+ "ADV|l-mark": 61,
1102
+ "ADV|l-orphan": 62,
1103
+ "ADV|r-acl": 63,
1104
+ "ADV|r-advcl": 64,
1105
+ "ADV|r-advmod": 65,
1106
+ "ADV|r-aux": 66,
1107
+ "ADV|r-compound": 67,
1108
+ "ADV|r-conj": 68,
1109
+ "ADV|r-det": 69,
1110
+ "ADV|r-fixed": 70,
1111
+ "ADV|r-flat": 71,
1112
+ "ADV|r-mark": 72,
1113
+ "ADV|r-nmod": 73,
1114
+ "ADV|r-obj": 74,
1115
+ "ADV|r-orphan": 75,
1116
+ "ADV|r-xcomp": 76,
1117
+ "ADV|root": 77,
1118
+ "AUX": 78,
1119
+ "AUX|Foreign=Yes": 79,
1120
+ "AUX|Foreign=Yes|l-aux": 80,
1121
+ "AUX|NounType=Class": 81,
1122
+ "AUX|NounType=Class|r-appos": 82,
1123
+ "AUX|Prefix=Yes": 83,
1124
+ "AUX|Prefix=Yes|l-aux": 84,
1125
+ "AUX|Prefix=Yes|r-aux": 85,
1126
+ "AUX|VerbType=Cop": 86,
1127
+ "AUX|VerbType=Cop|l-acl": 87,
1128
+ "AUX|VerbType=Cop|l-advcl": 88,
1129
+ "AUX|VerbType=Cop|l-aux": 89,
1130
+ "AUX|VerbType=Cop|l-cop": 90,
1131
+ "AUX|VerbType=Cop|r-acl": 91,
1132
+ "AUX|VerbType=Cop|r-advcl": 92,
1133
+ "AUX|VerbType=Cop|r-aux": 93,
1134
+ "AUX|VerbType=Cop|r-conj": 94,
1135
+ "AUX|VerbType=Cop|r-mark": 95,
1136
+ "AUX|VerbType=Cop|root": 96,
1137
+ "AUX|l-advmod": 97,
1138
+ "AUX|l-aux": 98,
1139
+ "AUX|l-cop": 99,
1140
+ "AUX|l-mark": 100,
1141
+ "AUX|r-acl": 101,
1142
+ "AUX|r-advmod": 102,
1143
+ "AUX|r-aux": 103,
1144
+ "AUX|r-ccomp": 104,
1145
+ "AUX|r-clf": 105,
1146
+ "AUX|r-compound": 106,
1147
+ "AUX|r-conj": 107,
1148
+ "AUX|r-fixed": 108,
1149
+ "AUX|root": 109,
1150
+ "B-ADP": 110,
1151
+ "B-ADP|Foreign=Yes": 111,
1152
+ "B-ADP|NounType=Class": 112,
1153
+ "B-ADP|Prefix=Yes": 113,
1154
+ "B-ADV": 114,
1155
+ "B-ADV|Foreign=Yes": 115,
1156
+ "B-ADV|NumType=Mult": 116,
1157
+ "B-ADV|PartType=Adv": 117,
1158
+ "B-ADV|PartType=Enp": 118,
1159
+ "B-ADV|PartType=Int": 119,
1160
+ "B-ADV|Prefix=Yes": 120,
1161
+ "B-AUX": 121,
1162
+ "B-AUX|Foreign=Yes": 122,
1163
+ "B-AUX|NounType=Class": 123,
1164
+ "B-AUX|Prefix=Yes": 124,
1165
+ "B-AUX|VerbType=Cop": 125,
1166
+ "B-CCONJ": 126,
1167
+ "B-CCONJ|Foreign=Yes": 127,
1168
+ "B-CCONJ|PronType=Prs": 128,
1169
+ "B-DET": 129,
1170
+ "B-DET|NumType=Mult": 130,
1171
+ "B-DET|PartType=Emp": 131,
1172
+ "B-DET|PartType=Int": 132,
1173
+ "B-NOUN": 133,
1174
+ "B-NOUN|Abbr=Yes": 134,
1175
+ "B-NOUN|Abbr=Yes|Foreign=Yes": 135,
1176
+ "B-NOUN|Abbr=Yes|Prefix=Yes": 136,
1177
+ "B-NOUN|Foreign=Yes": 137,
1178
+ "B-NOUN|Foreign=Yes|NounType=Class": 138,
1179
+ "B-NOUN|Foreign=Yes|Prefix=Yes": 139,
1180
+ "B-NOUN|NameType=Com": 140,
1181
+ "B-NOUN|NameType=Geo": 141,
1182
+ "B-NOUN|NameType=Nat": 142,
1183
+ "B-NOUN|NameType=Oth": 143,
1184
+ "B-NOUN|NameType=Pro": 144,
1185
+ "B-NOUN|NameType=Prs": 145,
1186
+ "B-NOUN|NounType=Class": 146,
1187
+ "B-NOUN|NounType=Class|Prefix=Yes": 147,
1188
+ "B-NOUN|NumType=Mult": 148,
1189
+ "B-NOUN|PartType=Enp": 149,
1190
+ "B-NOUN|PartType=Int": 150,
1191
+ "B-NOUN|PartType=Res": 151,
1192
+ "B-NOUN|Prefix=Yes": 152,
1193
+ "B-NUM": 153,
1194
+ "B-NUM|Abbr=Yes": 154,
1195
+ "B-NUM|Foreign=Yes": 155,
1196
+ "B-NUM|NumType=Mult": 156,
1197
+ "B-NUM|Prefix=Yes": 157,
1198
+ "B-PART": 158,
1199
+ "B-PART|NameType=Oth": 159,
1200
+ "B-PART|NounType=Class|PartType=Emp": 160,
1201
+ "B-PART|NounType=Class|PartType=Emp|Prefix=Yes": 161,
1202
+ "B-PART|NounType=Class|Prefix=Yes": 162,
1203
+ "B-PART|NumType=Mult|PartType=Emp": 163,
1204
+ "B-PART|PartType=Adj": 164,
1205
+ "B-PART|PartType=Adv": 165,
1206
+ "B-PART|PartType=Emp": 166,
1207
+ "B-PART|PartType=Emp|Prefix=Yes": 167,
1208
+ "B-PART|PartType=Enp": 168,
1209
+ "B-PART|PartType=Int": 169,
1210
+ "B-PART|PartType=Neg": 170,
1211
+ "B-PART|PartType=Res": 171,
1212
+ "B-PART|Prefix=Yes": 172,
1213
+ "B-PRON": 173,
1214
+ "B-PRON|NounType=Class": 174,
1215
+ "B-PRON|PronType=Prs": 175,
1216
+ "B-PRON|PronType=Rcp": 176,
1217
+ "B-PROPN": 177,
1218
+ "B-PROPN|Abbr=Yes": 178,
1219
+ "B-PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth": 179,
1220
+ "B-PROPN|Abbr=Yes|NameType=Com": 180,
1221
+ "B-PROPN|Foreign=Yes": 181,
1222
+ "B-PROPN|Foreign=Yes|NameType=Com": 182,
1223
+ "B-PROPN|Foreign=Yes|NameType=Geo": 183,
1224
+ "B-PROPN|Foreign=Yes|NameType=Giv": 184,
1225
+ "B-PROPN|Foreign=Yes|NameType=Oth": 185,
1226
+ "B-PROPN|Foreign=Yes|NameType=Prs": 186,
1227
+ "B-PROPN|Foreign=Yes|NameType=Sur": 187,
1228
+ "B-PROPN|NameType=Com": 188,
1229
+ "B-PROPN|NameType=Geo": 189,
1230
+ "B-PROPN|NameType=Giv": 190,
1231
+ "B-PROPN|NameType=Nat": 191,
1232
+ "B-PROPN|NameType=Oth": 192,
1233
+ "B-PROPN|NameType=Pro": 193,
1234
+ "B-PROPN|NameType=Prs": 194,
1235
+ "B-PROPN|NameType=Sur": 195,
1236
+ "B-PROPN|NounType=Class": 196,
1237
+ "B-PROPN|Prefix=Yes": 197,
1238
+ "B-PUNCT": 198,
1239
+ "B-PUNCT|NounType=Class": 199,
1240
+ "B-SCONJ": 200,
1241
+ "B-SCONJ|NumType=Mult": 201,
1242
+ "B-SCONJ|Prefix=Yes": 202,
1243
+ "B-SCONJ|VerbType=Cop": 203,
1244
+ "B-SYM": 204,
1245
+ "B-VERB": 205,
1246
+ "B-VERB|Abbr=Yes": 206,
1247
+ "B-VERB|Foreign=Yes": 207,
1248
+ "B-VERB|NounType=Class": 208,
1249
+ "B-VERB|PartType=Adj": 209,
1250
+ "B-VERB|Prefix=Yes": 210,
1251
+ "B-VERB|VerbType=Cop": 211,
1252
+ "CCONJ": 212,
1253
+ "CCONJ|Foreign=Yes": 213,
1254
+ "CCONJ|Foreign=Yes|l-cc": 214,
1255
+ "CCONJ|PronType=Prs": 215,
1256
+ "CCONJ|PronType=Prs|l-cc": 216,
1257
+ "CCONJ|l-advmod": 217,
1258
+ "CCONJ|l-case": 218,
1259
+ "CCONJ|l-cc": 219,
1260
+ "CCONJ|l-conj": 220,
1261
+ "CCONJ|l-discourse": 221,
1262
+ "CCONJ|l-fixed": 222,
1263
+ "CCONJ|l-flat": 223,
1264
+ "CCONJ|l-mark": 224,
1265
+ "CCONJ|l-nsubj": 225,
1266
+ "CCONJ|l-obj": 226,
1267
+ "CCONJ|l-orphan": 227,
1268
+ "CCONJ|r-cc": 228,
1269
+ "CCONJ|r-compound": 229,
1270
+ "CCONJ|r-fixed": 230,
1271
+ "CCONJ|r-mark": 231,
1272
+ "DET": 232,
1273
+ "DET|NumType=Mult": 233,
1274
+ "DET|NumType=Mult|l-det": 234,
1275
+ "DET|PartType=Emp": 235,
1276
+ "DET|PartType=Emp|r-det": 236,
1277
+ "DET|PartType=Int": 237,
1278
+ "DET|PartType=Int|r-det": 238,
1279
+ "DET|l-compound": 239,
1280
+ "DET|l-det": 240,
1281
+ "DET|l-discourse": 241,
1282
+ "DET|l-nsubj": 242,
1283
+ "DET|l-obl": 243,
1284
+ "DET|l-orphan": 244,
1285
+ "DET|r-advmod": 245,
1286
+ "DET|r-compound": 246,
1287
+ "DET|r-conj": 247,
1288
+ "DET|r-dep": 248,
1289
+ "DET|r-det": 249,
1290
+ "DET|r-fixed": 250,
1291
+ "DET|r-flat": 251,
1292
+ "DET|r-list": 252,
1293
+ "DET|r-nmod": 253,
1294
+ "DET|r-nummod": 254,
1295
+ "DET|r-obl": 255,
1296
+ "DET|r-orphan": 256,
1297
+ "I-ADP": 257,
1298
+ "I-ADP|Foreign=Yes": 258,
1299
+ "I-ADP|NounType=Class": 259,
1300
+ "I-ADP|Prefix=Yes": 260,
1301
+ "I-ADV": 261,
1302
+ "I-ADV|Foreign=Yes": 262,
1303
+ "I-ADV|NumType=Mult": 263,
1304
+ "I-ADV|PartType=Adv": 264,
1305
+ "I-ADV|PartType=Enp": 265,
1306
+ "I-ADV|PartType=Int": 266,
1307
+ "I-ADV|Prefix=Yes": 267,
1308
+ "I-AUX": 268,
1309
+ "I-AUX|Foreign=Yes": 269,
1310
+ "I-AUX|NounType=Class": 270,
1311
+ "I-AUX|Prefix=Yes": 271,
1312
+ "I-AUX|VerbType=Cop": 272,
1313
+ "I-CCONJ": 273,
1314
+ "I-CCONJ|Foreign=Yes": 274,
1315
+ "I-CCONJ|PronType=Prs": 275,
1316
+ "I-DET": 276,
1317
+ "I-DET|NumType=Mult": 277,
1318
+ "I-DET|PartType=Emp": 278,
1319
+ "I-DET|PartType=Int": 279,
1320
+ "I-NOUN": 280,
1321
+ "I-NOUN|Abbr=Yes": 281,
1322
+ "I-NOUN|Abbr=Yes|Foreign=Yes": 282,
1323
+ "I-NOUN|Abbr=Yes|Prefix=Yes": 283,
1324
+ "I-NOUN|Foreign=Yes": 284,
1325
+ "I-NOUN|Foreign=Yes|NounType=Class": 285,
1326
+ "I-NOUN|Foreign=Yes|Prefix=Yes": 286,
1327
+ "I-NOUN|NameType=Com": 287,
1328
+ "I-NOUN|NameType=Geo": 288,
1329
+ "I-NOUN|NameType=Nat": 289,
1330
+ "I-NOUN|NameType=Oth": 290,
1331
+ "I-NOUN|NameType=Pro": 291,
1332
+ "I-NOUN|NameType=Prs": 292,
1333
+ "I-NOUN|NounType=Class": 293,
1334
+ "I-NOUN|NounType=Class|Prefix=Yes": 294,
1335
+ "I-NOUN|NumType=Mult": 295,
1336
+ "I-NOUN|PartType=Enp": 296,
1337
+ "I-NOUN|PartType=Int": 297,
1338
+ "I-NOUN|PartType=Res": 298,
1339
+ "I-NOUN|Prefix=Yes": 299,
1340
+ "I-NUM": 300,
1341
+ "I-NUM|Abbr=Yes": 301,
1342
+ "I-NUM|Foreign=Yes": 302,
1343
+ "I-NUM|NumType=Mult": 303,
1344
+ "I-NUM|Prefix=Yes": 304,
1345
+ "I-PART": 305,
1346
+ "I-PART|NameType=Oth": 306,
1347
+ "I-PART|NounType=Class|PartType=Emp": 307,
1348
+ "I-PART|NounType=Class|PartType=Emp|Prefix=Yes": 308,
1349
+ "I-PART|NounType=Class|Prefix=Yes": 309,
1350
+ "I-PART|NumType=Mult|PartType=Emp": 310,
1351
+ "I-PART|PartType=Adj": 311,
1352
+ "I-PART|PartType=Adv": 312,
1353
+ "I-PART|PartType=Emp": 313,
1354
+ "I-PART|PartType=Emp|Prefix=Yes": 314,
1355
+ "I-PART|PartType=Enp": 315,
1356
+ "I-PART|PartType=Int": 316,
1357
+ "I-PART|PartType=Neg": 317,
1358
+ "I-PART|PartType=Res": 318,
1359
+ "I-PART|Prefix=Yes": 319,
1360
+ "I-PRON": 320,
1361
+ "I-PRON|NounType=Class": 321,
1362
+ "I-PRON|PronType=Prs": 322,
1363
+ "I-PRON|PronType=Rcp": 323,
1364
+ "I-PROPN": 324,
1365
+ "I-PROPN|Abbr=Yes": 325,
1366
+ "I-PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth": 326,
1367
+ "I-PROPN|Abbr=Yes|NameType=Com": 327,
1368
+ "I-PROPN|Foreign=Yes": 328,
1369
+ "I-PROPN|Foreign=Yes|NameType=Com": 329,
1370
+ "I-PROPN|Foreign=Yes|NameType=Geo": 330,
1371
+ "I-PROPN|Foreign=Yes|NameType=Giv": 331,
1372
+ "I-PROPN|Foreign=Yes|NameType=Oth": 332,
1373
+ "I-PROPN|Foreign=Yes|NameType=Prs": 333,
1374
+ "I-PROPN|Foreign=Yes|NameType=Sur": 334,
1375
+ "I-PROPN|NameType=Com": 335,
1376
+ "I-PROPN|NameType=Geo": 336,
1377
+ "I-PROPN|NameType=Giv": 337,
1378
+ "I-PROPN|NameType=Nat": 338,
1379
+ "I-PROPN|NameType=Oth": 339,
1380
+ "I-PROPN|NameType=Pro": 340,
1381
+ "I-PROPN|NameType=Prs": 341,
1382
+ "I-PROPN|NameType=Sur": 342,
1383
+ "I-PROPN|NounType=Class": 343,
1384
+ "I-PROPN|Prefix=Yes": 344,
1385
+ "I-PUNCT": 345,
1386
+ "I-PUNCT|NounType=Class": 346,
1387
+ "I-SCONJ": 347,
1388
+ "I-SCONJ|NumType=Mult": 348,
1389
+ "I-SCONJ|Prefix=Yes": 349,
1390
+ "I-SCONJ|VerbType=Cop": 350,
1391
+ "I-SYM": 351,
1392
+ "I-VERB": 352,
1393
+ "I-VERB|Abbr=Yes": 353,
1394
+ "I-VERB|Foreign=Yes": 354,
1395
+ "I-VERB|NounType=Class": 355,
1396
+ "I-VERB|PartType=Adj": 356,
1397
+ "I-VERB|Prefix=Yes": 357,
1398
+ "I-VERB|VerbType=Cop": 358,
1399
+ "NOUN": 359,
1400
+ "NOUN|Abbr=Yes": 360,
1401
+ "NOUN|Abbr=Yes|Foreign=Yes": 361,
1402
+ "NOUN|Abbr=Yes|Foreign=Yes|r-nmod": 362,
1403
+ "NOUN|Abbr=Yes|Prefix=Yes": 363,
1404
+ "NOUN|Abbr=Yes|Prefix=Yes|l-flat": 364,
1405
+ "NOUN|Abbr=Yes|l-flat": 365,
1406
+ "NOUN|Abbr=Yes|l-nmod": 366,
1407
+ "NOUN|Abbr=Yes|l-nsubj": 367,
1408
+ "NOUN|Abbr=Yes|l-obl": 368,
1409
+ "NOUN|Abbr=Yes|r-acl": 369,
1410
+ "NOUN|Abbr=Yes|r-appos": 370,
1411
+ "NOUN|Abbr=Yes|r-clf": 371,
1412
+ "NOUN|Abbr=Yes|r-conj": 372,
1413
+ "NOUN|Abbr=Yes|r-fixed": 373,
1414
+ "NOUN|Abbr=Yes|r-flat": 374,
1415
+ "NOUN|Abbr=Yes|r-nmod": 375,
1416
+ "NOUN|Abbr=Yes|r-obj": 376,
1417
+ "NOUN|Abbr=Yes|r-obl": 377,
1418
+ "NOUN|Foreign=Yes": 378,
1419
+ "NOUN|Foreign=Yes|NounType=Class": 379,
1420
+ "NOUN|Foreign=Yes|NounType=Class|r-clf": 380,
1421
+ "NOUN|Foreign=Yes|NounType=Class|r-obj": 381,
1422
+ "NOUN|Foreign=Yes|Prefix=Yes": 382,
1423
+ "NOUN|Foreign=Yes|Prefix=Yes|l-flat": 383,
1424
+ "NOUN|Foreign=Yes|Prefix=Yes|r-appos": 384,
1425
+ "NOUN|Foreign=Yes|l-dislocated": 385,
1426
+ "NOUN|Foreign=Yes|l-flat": 386,
1427
+ "NOUN|Foreign=Yes|l-nmod": 387,
1428
+ "NOUN|Foreign=Yes|l-nsubj": 388,
1429
+ "NOUN|Foreign=Yes|l-obl": 389,
1430
+ "NOUN|Foreign=Yes|r-acl": 390,
1431
+ "NOUN|Foreign=Yes|r-advcl": 391,
1432
+ "NOUN|Foreign=Yes|r-advmod": 392,
1433
+ "NOUN|Foreign=Yes|r-appos": 393,
1434
+ "NOUN|Foreign=Yes|r-ccomp": 394,
1435
+ "NOUN|Foreign=Yes|r-clf": 395,
1436
+ "NOUN|Foreign=Yes|r-compound": 396,
1437
+ "NOUN|Foreign=Yes|r-conj": 397,
1438
+ "NOUN|Foreign=Yes|r-flat": 398,
1439
+ "NOUN|Foreign=Yes|r-iobj": 399,
1440
+ "NOUN|Foreign=Yes|r-list": 400,
1441
+ "NOUN|Foreign=Yes|r-nmod": 401,
1442
+ "NOUN|Foreign=Yes|r-obj": 402,
1443
+ "NOUN|Foreign=Yes|r-obl": 403,
1444
+ "NOUN|Foreign=Yes|r-xcomp": 404,
1445
+ "NOUN|Foreign=Yes|root": 405,
1446
+ "NOUN|NameType=Com": 406,
1447
+ "NOUN|NameType=Com|r-nmod": 407,
1448
+ "NOUN|NameType=Geo": 408,
1449
+ "NOUN|NameType=Geo|l-nsubj": 409,
1450
+ "NOUN|NameType=Geo|r-nmod": 410,
1451
+ "NOUN|NameType=Geo|r-obj": 411,
1452
+ "NOUN|NameType=Nat": 412,
1453
+ "NOUN|NameType=Nat|r-nmod": 413,
1454
+ "NOUN|NameType=Oth": 414,
1455
+ "NOUN|NameType=Oth|l-nsubj": 415,
1456
+ "NOUN|NameType=Oth|r-conj": 416,
1457
+ "NOUN|NameType=Oth|r-flat": 417,
1458
+ "NOUN|NameType=Oth|r-nmod": 418,
1459
+ "NOUN|NameType=Pro": 419,
1460
+ "NOUN|NameType=Pro|r-nmod": 420,
1461
+ "NOUN|NameType=Prs": 421,
1462
+ "NOUN|NameType=Prs|l-nsubj": 422,
1463
+ "NOUN|NameType=Prs|r-nmod": 423,
1464
+ "NOUN|NounType=Class": 424,
1465
+ "NOUN|NounType=Class|Prefix=Yes": 425,
1466
+ "NOUN|NounType=Class|Prefix=Yes|l-advcl": 426,
1467
+ "NOUN|NounType=Class|Prefix=Yes|l-advmod": 427,
1468
+ "NOUN|NounType=Class|Prefix=Yes|l-mark": 428,
1469
+ "NOUN|NounType=Class|Prefix=Yes|l-nmod": 429,
1470
+ "NOUN|NounType=Class|Prefix=Yes|l-nsubj": 430,
1471
+ "NOUN|NounType=Class|Prefix=Yes|r-advcl": 431,
1472
+ "NOUN|NounType=Class|Prefix=Yes|r-clf": 432,
1473
+ "NOUN|NounType=Class|Prefix=Yes|r-nmod": 433,
1474
+ "NOUN|NounType=Class|Prefix=Yes|r-obj": 434,
1475
+ "NOUN|NounType=Class|l-advcl": 435,
1476
+ "NOUN|NounType=Class|l-advmod": 436,
1477
+ "NOUN|NounType=Class|l-clf": 437,
1478
+ "NOUN|NounType=Class|l-dislocated": 438,
1479
+ "NOUN|NounType=Class|l-nmod": 439,
1480
+ "NOUN|NounType=Class|l-nsubj": 440,
1481
+ "NOUN|NounType=Class|l-obj": 441,
1482
+ "NOUN|NounType=Class|l-obl": 442,
1483
+ "NOUN|NounType=Class|r-acl": 443,
1484
+ "NOUN|NounType=Class|r-advcl": 444,
1485
+ "NOUN|NounType=Class|r-advmod": 445,
1486
+ "NOUN|NounType=Class|r-appos": 446,
1487
+ "NOUN|NounType=Class|r-cc": 447,
1488
+ "NOUN|NounType=Class|r-ccomp": 448,
1489
+ "NOUN|NounType=Class|r-clf": 449,
1490
+ "NOUN|NounType=Class|r-compound": 450,
1491
+ "NOUN|NounType=Class|r-conj": 451,
1492
+ "NOUN|NounType=Class|r-dislocated": 452,
1493
+ "NOUN|NounType=Class|r-fixed": 453,
1494
+ "NOUN|NounType=Class|r-flat": 454,
1495
+ "NOUN|NounType=Class|r-iobj": 455,
1496
+ "NOUN|NounType=Class|r-list": 456,
1497
+ "NOUN|NounType=Class|r-nmod": 457,
1498
+ "NOUN|NounType=Class|r-nummod": 458,
1499
+ "NOUN|NounType=Class|r-obj": 459,
1500
+ "NOUN|NounType=Class|r-obl": 460,
1501
+ "NOUN|NounType=Class|r-orphan": 461,
1502
+ "NOUN|NounType=Class|r-xcomp": 462,
1503
+ "NOUN|NounType=Class|root": 463,
1504
+ "NOUN|NumType=Mult": 464,
1505
+ "NOUN|NumType=Mult|r-advcl": 465,
1506
+ "NOUN|NumType=Mult|r-nmod": 466,
1507
+ "NOUN|NumType=Mult|r-obj": 467,
1508
+ "NOUN|PartType=Enp": 468,
1509
+ "NOUN|PartType=Enp|r-obj": 469,
1510
+ "NOUN|PartType=Enp|r-obl": 470,
1511
+ "NOUN|PartType=Int": 471,
1512
+ "NOUN|PartType=Int|r-obj": 472,
1513
+ "NOUN|PartType=Res": 473,
1514
+ "NOUN|PartType=Res|r-nmod": 474,
1515
+ "NOUN|PartType=Res|r-obj": 475,
1516
+ "NOUN|Prefix=Yes": 476,
1517
+ "NOUN|Prefix=Yes|l-acl": 477,
1518
+ "NOUN|Prefix=Yes|l-advcl": 478,
1519
+ "NOUN|Prefix=Yes|l-clf": 479,
1520
+ "NOUN|Prefix=Yes|l-csubj": 480,
1521
+ "NOUN|Prefix=Yes|l-dislocated": 481,
1522
+ "NOUN|Prefix=Yes|l-flat": 482,
1523
+ "NOUN|Prefix=Yes|l-nmod": 483,
1524
+ "NOUN|Prefix=Yes|l-nsubj": 484,
1525
+ "NOUN|Prefix=Yes|l-obj": 485,
1526
+ "NOUN|Prefix=Yes|l-obl": 486,
1527
+ "NOUN|Prefix=Yes|r-acl": 487,
1528
+ "NOUN|Prefix=Yes|r-advcl": 488,
1529
+ "NOUN|Prefix=Yes|r-advmod": 489,
1530
+ "NOUN|Prefix=Yes|r-appos": 490,
1531
+ "NOUN|Prefix=Yes|r-case": 491,
1532
+ "NOUN|Prefix=Yes|r-cc": 492,
1533
+ "NOUN|Prefix=Yes|r-ccomp": 493,
1534
+ "NOUN|Prefix=Yes|r-clf": 494,
1535
+ "NOUN|Prefix=Yes|r-compound": 495,
1536
+ "NOUN|Prefix=Yes|r-conj": 496,
1537
+ "NOUN|Prefix=Yes|r-dislocated": 497,
1538
+ "NOUN|Prefix=Yes|r-fixed": 498,
1539
+ "NOUN|Prefix=Yes|r-flat": 499,
1540
+ "NOUN|Prefix=Yes|r-iobj": 500,
1541
+ "NOUN|Prefix=Yes|r-list": 501,
1542
+ "NOUN|Prefix=Yes|r-nmod": 502,
1543
+ "NOUN|Prefix=Yes|r-nummod": 503,
1544
+ "NOUN|Prefix=Yes|r-obj": 504,
1545
+ "NOUN|Prefix=Yes|r-obl": 505,
1546
+ "NOUN|Prefix=Yes|r-orphan": 506,
1547
+ "NOUN|Prefix=Yes|r-xcomp": 507,
1548
+ "NOUN|Prefix=Yes|root": 508,
1549
+ "NOUN|l-acl": 509,
1550
+ "NOUN|l-advcl": 510,
1551
+ "NOUN|l-advmod": 511,
1552
+ "NOUN|l-case": 512,
1553
+ "NOUN|l-ccomp": 513,
1554
+ "NOUN|l-compound": 514,
1555
+ "NOUN|l-csubj": 515,
1556
+ "NOUN|l-discourse": 516,
1557
+ "NOUN|l-dislocated": 517,
1558
+ "NOUN|l-expl": 518,
1559
+ "NOUN|l-flat": 519,
1560
+ "NOUN|l-iobj": 520,
1561
+ "NOUN|l-mark": 521,
1562
+ "NOUN|l-nmod": 522,
1563
+ "NOUN|l-nsubj": 523,
1564
+ "NOUN|l-nummod": 524,
1565
+ "NOUN|l-obj": 525,
1566
+ "NOUN|l-obl": 526,
1567
+ "NOUN|l-orphan": 527,
1568
+ "NOUN|l-vocative": 528,
1569
+ "NOUN|r-acl": 529,
1570
+ "NOUN|r-advcl": 530,
1571
+ "NOUN|r-advmod": 531,
1572
+ "NOUN|r-appos": 532,
1573
+ "NOUN|r-case": 533,
1574
+ "NOUN|r-cc": 534,
1575
+ "NOUN|r-ccomp": 535,
1576
+ "NOUN|r-clf": 536,
1577
+ "NOUN|r-compound": 537,
1578
+ "NOUN|r-conj": 538,
1579
+ "NOUN|r-cop": 539,
1580
+ "NOUN|r-discourse": 540,
1581
+ "NOUN|r-dislocated": 541,
1582
+ "NOUN|r-fixed": 542,
1583
+ "NOUN|r-flat": 543,
1584
+ "NOUN|r-iobj": 544,
1585
+ "NOUN|r-list": 545,
1586
+ "NOUN|r-mark": 546,
1587
+ "NOUN|r-nmod": 547,
1588
+ "NOUN|r-nsubj": 548,
1589
+ "NOUN|r-nummod": 549,
1590
+ "NOUN|r-obj": 550,
1591
+ "NOUN|r-obl": 551,
1592
+ "NOUN|r-orphan": 552,
1593
+ "NOUN|r-parataxis": 553,
1594
+ "NOUN|r-xcomp": 554,
1595
+ "NOUN|root": 555,
1596
+ "NUM": 556,
1597
+ "NUM|Abbr=Yes": 557,
1598
+ "NUM|Abbr=Yes|r-flat": 558,
1599
+ "NUM|Abbr=Yes|r-nummod": 559,
1600
+ "NUM|Abbr=Yes|r-obj": 560,
1601
+ "NUM|Foreign=Yes": 561,
1602
+ "NUM|Foreign=Yes|r-clf": 562,
1603
+ "NUM|NumType=Mult": 563,
1604
+ "NUM|NumType=Mult|l-advmod": 564,
1605
+ "NUM|NumType=Mult|l-nummod": 565,
1606
+ "NUM|NumType=Mult|r-advmod": 566,
1607
+ "NUM|Prefix=Yes": 567,
1608
+ "NUM|Prefix=Yes|l-nummod": 568,
1609
+ "NUM|l-advcl": 569,
1610
+ "NUM|l-advmod": 570,
1611
+ "NUM|l-case": 571,
1612
+ "NUM|l-clf": 572,
1613
+ "NUM|l-dep": 573,
1614
+ "NUM|l-flat": 574,
1615
+ "NUM|l-nmod": 575,
1616
+ "NUM|l-nsubj": 576,
1617
+ "NUM|l-nummod": 577,
1618
+ "NUM|l-obl": 578,
1619
+ "NUM|r-acl": 579,
1620
+ "NUM|r-advmod": 580,
1621
+ "NUM|r-appos": 581,
1622
+ "NUM|r-ccomp": 582,
1623
+ "NUM|r-compound": 583,
1624
+ "NUM|r-conj": 584,
1625
+ "NUM|r-det": 585,
1626
+ "NUM|r-fixed": 586,
1627
+ "NUM|r-flat": 587,
1628
+ "NUM|r-iobj": 588,
1629
+ "NUM|r-nmod": 589,
1630
+ "NUM|r-nummod": 590,
1631
+ "NUM|r-obj": 591,
1632
+ "NUM|r-obl": 592,
1633
+ "NUM|root": 593,
1634
+ "PART": 594,
1635
+ "PART|NameType=Oth": 595,
1636
+ "PART|NameType=Oth|l-advmod": 596,
1637
+ "PART|NounType=Class|PartType=Emp": 597,
1638
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes": 598,
1639
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark": 599,
1640
+ "PART|NounType=Class|PartType=Emp|l-mark": 600,
1641
+ "PART|NounType=Class|Prefix=Yes": 601,
1642
+ "PART|NounType=Class|Prefix=Yes|l-mark": 602,
1643
+ "PART|NumType=Mult|PartType=Emp": 603,
1644
+ "PART|NumType=Mult|PartType=Emp|l-mark": 604,
1645
+ "PART|PartType=Adj": 605,
1646
+ "PART|PartType=Adj|l-mark": 606,
1647
+ "PART|PartType=Adj|l-orphan": 607,
1648
+ "PART|PartType=Adj|r-acl": 608,
1649
+ "PART|PartType=Adj|r-compound": 609,
1650
+ "PART|PartType=Adj|r-nmod": 610,
1651
+ "PART|PartType=Adv": 611,
1652
+ "PART|PartType=Adv|l-advmod": 612,
1653
+ "PART|PartType=Adv|l-mark": 613,
1654
+ "PART|PartType=Adv|r-advmod": 614,
1655
+ "PART|PartType=Emp": 615,
1656
+ "PART|PartType=Emp|Prefix=Yes": 616,
1657
+ "PART|PartType=Emp|Prefix=Yes|l-advmod": 617,
1658
+ "PART|PartType=Emp|Prefix=Yes|l-aux": 618,
1659
+ "PART|PartType=Emp|Prefix=Yes|l-mark": 619,
1660
+ "PART|PartType=Emp|l-advmod": 620,
1661
+ "PART|PartType=Emp|l-case": 621,
1662
+ "PART|PartType=Emp|l-discourse": 622,
1663
+ "PART|PartType=Emp|l-mark": 623,
1664
+ "PART|PartType=Emp|r-acl": 624,
1665
+ "PART|PartType=Emp|r-advmod": 625,
1666
+ "PART|PartType=Emp|r-aux": 626,
1667
+ "PART|PartType=Emp|r-compound": 627,
1668
+ "PART|PartType=Emp|r-det": 628,
1669
+ "PART|PartType=Emp|r-fixed": 629,
1670
+ "PART|PartType=Emp|r-mark": 630,
1671
+ "PART|PartType=Emp|r-nmod": 631,
1672
+ "PART|PartType=Enp": 632,
1673
+ "PART|PartType=Enp|l-discourse": 633,
1674
+ "PART|PartType=Enp|r-acl": 634,
1675
+ "PART|PartType=Enp|r-advmod": 635,
1676
+ "PART|PartType=Enp|r-compound": 636,
1677
+ "PART|PartType=Enp|r-dep": 637,
1678
+ "PART|PartType=Enp|r-det": 638,
1679
+ "PART|PartType=Enp|r-discourse": 639,
1680
+ "PART|PartType=Enp|r-fixed": 640,
1681
+ "PART|PartType=Enp|r-obl": 641,
1682
+ "PART|PartType=Int": 642,
1683
+ "PART|PartType=Int|l-advmod": 643,
1684
+ "PART|PartType=Int|l-mark": 644,
1685
+ "PART|PartType=Int|r-acl": 645,
1686
+ "PART|PartType=Int|r-advmod": 646,
1687
+ "PART|PartType=Int|r-dep": 647,
1688
+ "PART|PartType=Int|r-discourse": 648,
1689
+ "PART|PartType=Int|r-nmod": 649,
1690
+ "PART|PartType=Int|r-obj": 650,
1691
+ "PART|PartType=Int|r-obl": 651,
1692
+ "PART|PartType=Neg": 652,
1693
+ "PART|PartType=Neg|l-advcl": 653,
1694
+ "PART|PartType=Neg|l-advmod": 654,
1695
+ "PART|PartType=Neg|l-aux": 655,
1696
+ "PART|PartType=Neg|l-mark": 656,
1697
+ "PART|PartType=Neg|r-acl": 657,
1698
+ "PART|PartType=Neg|r-advmod": 658,
1699
+ "PART|PartType=Neg|r-fixed": 659,
1700
+ "PART|PartType=Res": 660,
1701
+ "PART|PartType=Res|r-advmod": 661,
1702
+ "PART|PartType=Res|r-discourse": 662,
1703
+ "PART|PartType=Res|r-fixed": 663,
1704
+ "PART|Prefix=Yes": 664,
1705
+ "PART|Prefix=Yes|l-advmod": 665,
1706
+ "PART|Prefix=Yes|l-aux": 666,
1707
+ "PART|Prefix=Yes|l-mark": 667,
1708
+ "PART|Prefix=Yes|r-acl": 668,
1709
+ "PART|Prefix=Yes|r-nmod": 669,
1710
+ "PART|l-advmod": 670,
1711
+ "PART|l-discourse": 671,
1712
+ "PART|l-mark": 672,
1713
+ "PART|l-nsubj": 673,
1714
+ "PART|r-acl": 674,
1715
+ "PART|r-advmod": 675,
1716
+ "PART|r-discourse": 676,
1717
+ "PART|r-fixed": 677,
1718
+ "PART|r-mark": 678,
1719
+ "PART|r-obj": 679,
1720
+ "PRON": 680,
1721
+ "PRON|NounType=Class": 681,
1722
+ "PRON|NounType=Class|r-clf": 682,
1723
+ "PRON|PronType=Prs": 683,
1724
+ "PRON|PronType=Prs|l-advmod": 684,
1725
+ "PRON|PronType=Prs|l-expl": 685,
1726
+ "PRON|PronType=Prs|l-nsubj": 686,
1727
+ "PRON|PronType=Prs|l-obj": 687,
1728
+ "PRON|PronType=Prs|l-obl": 688,
1729
+ "PRON|PronType=Prs|r-advcl": 689,
1730
+ "PRON|PronType=Prs|r-advmod": 690,
1731
+ "PRON|PronType=Prs|r-ccomp": 691,
1732
+ "PRON|PronType=Prs|r-clf": 692,
1733
+ "PRON|PronType=Prs|r-conj": 693,
1734
+ "PRON|PronType=Prs|r-nmod": 694,
1735
+ "PRON|PronType=Prs|r-nsubj": 695,
1736
+ "PRON|PronType=Prs|r-obj": 696,
1737
+ "PRON|PronType=Prs|r-obl": 697,
1738
+ "PRON|PronType=Prs|root": 698,
1739
+ "PRON|PronType=Rcp": 699,
1740
+ "PRON|PronType=Rcp|r-advmod": 700,
1741
+ "PRON|PronType=Rcp|r-iobj": 701,
1742
+ "PRON|PronType=Rcp|r-nmod": 702,
1743
+ "PRON|PronType=Rcp|r-obj": 703,
1744
+ "PRON|PronType=Rcp|r-obl": 704,
1745
+ "PRON|l-advcl": 705,
1746
+ "PRON|l-advmod": 706,
1747
+ "PRON|l-compound": 707,
1748
+ "PRON|l-csubj": 708,
1749
+ "PRON|l-dislocated": 709,
1750
+ "PRON|l-expl": 710,
1751
+ "PRON|l-iobj": 711,
1752
+ "PRON|l-mark": 712,
1753
+ "PRON|l-nsubj": 713,
1754
+ "PRON|l-obj": 714,
1755
+ "PRON|l-obl": 715,
1756
+ "PRON|r-acl": 716,
1757
+ "PRON|r-advmod": 717,
1758
+ "PRON|r-appos": 718,
1759
+ "PRON|r-ccomp": 719,
1760
+ "PRON|r-compound": 720,
1761
+ "PRON|r-conj": 721,
1762
+ "PRON|r-det": 722,
1763
+ "PRON|r-discourse": 723,
1764
+ "PRON|r-fixed": 724,
1765
+ "PRON|r-flat": 725,
1766
+ "PRON|r-iobj": 726,
1767
+ "PRON|r-nmod": 727,
1768
+ "PRON|r-nsubj": 728,
1769
+ "PRON|r-obj": 729,
1770
+ "PRON|r-obl": 730,
1771
+ "PROPN": 731,
1772
+ "PROPN|Abbr=Yes": 732,
1773
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth": 733,
1774
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj": 734,
1775
+ "PROPN|Abbr=Yes|NameType=Com": 735,
1776
+ "PROPN|Abbr=Yes|NameType=Com|r-advmod": 736,
1777
+ "PROPN|Abbr=Yes|NameType=Com|r-nmod": 737,
1778
+ "PROPN|Abbr=Yes|l-nmod": 738,
1779
+ "PROPN|Abbr=Yes|l-nsubj": 739,
1780
+ "PROPN|Abbr=Yes|r-nmod": 740,
1781
+ "PROPN|Foreign=Yes": 741,
1782
+ "PROPN|Foreign=Yes|NameType=Com": 742,
1783
+ "PROPN|Foreign=Yes|NameType=Com|l-nsubj": 743,
1784
+ "PROPN|Foreign=Yes|NameType=Com|r-list": 744,
1785
+ "PROPN|Foreign=Yes|NameType=Com|r-nmod": 745,
1786
+ "PROPN|Foreign=Yes|NameType=Com|r-obl": 746,
1787
+ "PROPN|Foreign=Yes|NameType=Geo": 747,
1788
+ "PROPN|Foreign=Yes|NameType=Geo|r-obj": 748,
1789
+ "PROPN|Foreign=Yes|NameType=Geo|r-obl": 749,
1790
+ "PROPN|Foreign=Yes|NameType=Giv": 750,
1791
+ "PROPN|Foreign=Yes|NameType=Giv|l-nsubj": 751,
1792
+ "PROPN|Foreign=Yes|NameType=Oth": 752,
1793
+ "PROPN|Foreign=Yes|NameType=Oth|r-conj": 753,
1794
+ "PROPN|Foreign=Yes|NameType=Oth|r-flat": 754,
1795
+ "PROPN|Foreign=Yes|NameType=Oth|r-nmod": 755,
1796
+ "PROPN|Foreign=Yes|NameType=Prs": 756,
1797
+ "PROPN|Foreign=Yes|NameType=Prs|l-flat": 757,
1798
+ "PROPN|Foreign=Yes|NameType=Prs|l-nsubj": 758,
1799
+ "PROPN|Foreign=Yes|NameType=Prs|r-conj": 759,
1800
+ "PROPN|Foreign=Yes|NameType=Prs|r-flat": 760,
1801
+ "PROPN|Foreign=Yes|NameType=Prs|r-nmod": 761,
1802
+ "PROPN|Foreign=Yes|NameType=Prs|r-obj": 762,
1803
+ "PROPN|Foreign=Yes|NameType=Prs|r-obl": 763,
1804
+ "PROPN|Foreign=Yes|NameType=Sur": 764,
1805
+ "PROPN|Foreign=Yes|NameType=Sur|r-flat": 765,
1806
+ "PROPN|Foreign=Yes|l-flat": 766,
1807
+ "PROPN|Foreign=Yes|l-nmod": 767,
1808
+ "PROPN|Foreign=Yes|l-nsubj": 768,
1809
+ "PROPN|Foreign=Yes|l-obl": 769,
1810
+ "PROPN|Foreign=Yes|r-appos": 770,
1811
+ "PROPN|Foreign=Yes|r-ccomp": 771,
1812
+ "PROPN|Foreign=Yes|r-compound": 772,
1813
+ "PROPN|Foreign=Yes|r-conj": 773,
1814
+ "PROPN|Foreign=Yes|r-flat": 774,
1815
+ "PROPN|Foreign=Yes|r-iobj": 775,
1816
+ "PROPN|Foreign=Yes|r-list": 776,
1817
+ "PROPN|Foreign=Yes|r-nmod": 777,
1818
+ "PROPN|Foreign=Yes|r-nsubj": 778,
1819
+ "PROPN|Foreign=Yes|r-obj": 779,
1820
+ "PROPN|Foreign=Yes|r-obl": 780,
1821
+ "PROPN|Foreign=Yes|root": 781,
1822
+ "PROPN|NameType=Com": 782,
1823
+ "PROPN|NameType=Com|l-nsubj": 783,
1824
+ "PROPN|NameType=Com|l-obl": 784,
1825
+ "PROPN|NameType=Com|r-appos": 785,
1826
+ "PROPN|NameType=Com|r-conj": 786,
1827
+ "PROPN|NameType=Com|r-flat": 787,
1828
+ "PROPN|NameType=Com|r-list": 788,
1829
+ "PROPN|NameType=Com|r-nmod": 789,
1830
+ "PROPN|NameType=Com|r-nsubj": 790,
1831
+ "PROPN|NameType=Com|r-obj": 791,
1832
+ "PROPN|NameType=Com|r-obl": 792,
1833
+ "PROPN|NameType=Geo": 793,
1834
+ "PROPN|NameType=Geo|l-nsubj": 794,
1835
+ "PROPN|NameType=Geo|l-obl": 795,
1836
+ "PROPN|NameType=Geo|r-compound": 796,
1837
+ "PROPN|NameType=Geo|r-conj": 797,
1838
+ "PROPN|NameType=Geo|r-flat": 798,
1839
+ "PROPN|NameType=Geo|r-list": 799,
1840
+ "PROPN|NameType=Geo|r-nmod": 800,
1841
+ "PROPN|NameType=Geo|r-nsubj": 801,
1842
+ "PROPN|NameType=Geo|r-nummod": 802,
1843
+ "PROPN|NameType=Geo|r-obj": 803,
1844
+ "PROPN|NameType=Geo|r-obl": 804,
1845
+ "PROPN|NameType=Geo|root": 805,
1846
+ "PROPN|NameType=Giv": 806,
1847
+ "PROPN|NameType=Giv|l-dislocated": 807,
1848
+ "PROPN|NameType=Giv|l-nsubj": 808,
1849
+ "PROPN|NameType=Giv|l-obl": 809,
1850
+ "PROPN|NameType=Giv|r-acl": 810,
1851
+ "PROPN|NameType=Giv|r-appos": 811,
1852
+ "PROPN|NameType=Giv|r-ccomp": 812,
1853
+ "PROPN|NameType=Giv|r-conj": 813,
1854
+ "PROPN|NameType=Giv|r-flat": 814,
1855
+ "PROPN|NameType=Giv|r-list": 815,
1856
+ "PROPN|NameType=Giv|r-nmod": 816,
1857
+ "PROPN|NameType=Giv|r-nsubj": 817,
1858
+ "PROPN|NameType=Giv|r-obj": 818,
1859
+ "PROPN|NameType=Giv|r-obl": 819,
1860
+ "PROPN|NameType=Giv|root": 820,
1861
+ "PROPN|NameType=Nat": 821,
1862
+ "PROPN|NameType=Nat|l-csubj": 822,
1863
+ "PROPN|NameType=Nat|l-nsubj": 823,
1864
+ "PROPN|NameType=Nat|l-obl": 824,
1865
+ "PROPN|NameType=Nat|r-acl": 825,
1866
+ "PROPN|NameType=Nat|r-appos": 826,
1867
+ "PROPN|NameType=Nat|r-compound": 827,
1868
+ "PROPN|NameType=Nat|r-conj": 828,
1869
+ "PROPN|NameType=Nat|r-flat": 829,
1870
+ "PROPN|NameType=Nat|r-list": 830,
1871
+ "PROPN|NameType=Nat|r-nmod": 831,
1872
+ "PROPN|NameType=Nat|r-nummod": 832,
1873
+ "PROPN|NameType=Nat|r-obj": 833,
1874
+ "PROPN|NameType=Nat|r-obl": 834,
1875
+ "PROPN|NameType=Oth": 835,
1876
+ "PROPN|NameType=Oth|l-dislocated": 836,
1877
+ "PROPN|NameType=Oth|l-nsubj": 837,
1878
+ "PROPN|NameType=Oth|r-acl": 838,
1879
+ "PROPN|NameType=Oth|r-appos": 839,
1880
+ "PROPN|NameType=Oth|r-compound": 840,
1881
+ "PROPN|NameType=Oth|r-conj": 841,
1882
+ "PROPN|NameType=Oth|r-flat": 842,
1883
+ "PROPN|NameType=Oth|r-nmod": 843,
1884
+ "PROPN|NameType=Oth|r-obj": 844,
1885
+ "PROPN|NameType=Oth|r-obl": 845,
1886
+ "PROPN|NameType=Oth|root": 846,
1887
+ "PROPN|NameType=Pro": 847,
1888
+ "PROPN|NameType=Pro|l-nsubj": 848,
1889
+ "PROPN|NameType=Pro|l-obl": 849,
1890
+ "PROPN|NameType=Pro|r-advcl": 850,
1891
+ "PROPN|NameType=Pro|r-flat": 851,
1892
+ "PROPN|NameType=Pro|r-nmod": 852,
1893
+ "PROPN|NameType=Pro|r-obj": 853,
1894
+ "PROPN|NameType=Prs": 854,
1895
+ "PROPN|NameType=Prs|l-dislocated": 855,
1896
+ "PROPN|NameType=Prs|l-nsubj": 856,
1897
+ "PROPN|NameType=Prs|l-obl": 857,
1898
+ "PROPN|NameType=Prs|l-vocative": 858,
1899
+ "PROPN|NameType=Prs|r-conj": 859,
1900
+ "PROPN|NameType=Prs|r-discourse": 860,
1901
+ "PROPN|NameType=Prs|r-flat": 861,
1902
+ "PROPN|NameType=Prs|r-list": 862,
1903
+ "PROPN|NameType=Prs|r-nmod": 863,
1904
+ "PROPN|NameType=Prs|r-obj": 864,
1905
+ "PROPN|NameType=Prs|r-obl": 865,
1906
+ "PROPN|NameType=Prs|r-vocative": 866,
1907
+ "PROPN|NameType=Sur": 867,
1908
+ "PROPN|NameType=Sur|l-nsubj": 868,
1909
+ "PROPN|NameType=Sur|r-flat": 869,
1910
+ "PROPN|NameType=Sur|r-nmod": 870,
1911
+ "PROPN|NounType=Class": 871,
1912
+ "PROPN|NounType=Class|r-clf": 872,
1913
+ "PROPN|Prefix=Yes": 873,
1914
+ "PROPN|Prefix=Yes|l-nsubj": 874,
1915
+ "PROPN|Prefix=Yes|r-nmod": 875,
1916
+ "PROPN|l-advmod": 876,
1917
+ "PROPN|l-nsubj": 877,
1918
+ "PROPN|l-obl": 878,
1919
+ "PROPN|r-acl": 879,
1920
+ "PROPN|r-advmod": 880,
1921
+ "PROPN|r-appos": 881,
1922
+ "PROPN|r-clf": 882,
1923
+ "PROPN|r-compound": 883,
1924
+ "PROPN|r-conj": 884,
1925
+ "PROPN|r-fixed": 885,
1926
+ "PROPN|r-flat": 886,
1927
+ "PROPN|r-iobj": 887,
1928
+ "PROPN|r-list": 888,
1929
+ "PROPN|r-nmod": 889,
1930
+ "PROPN|r-obj": 890,
1931
+ "PROPN|r-obl": 891,
1932
+ "PROPN|root": 892,
1933
+ "PUNCT": 893,
1934
+ "PUNCT|NounType=Class": 894,
1935
+ "PUNCT|NounType=Class|r-punct": 895,
1936
+ "PUNCT|l-advmod": 896,
1937
+ "PUNCT|l-dep": 897,
1938
+ "PUNCT|l-punct": 898,
1939
+ "PUNCT|r-dep": 899,
1940
+ "PUNCT|r-punct": 900,
1941
+ "SCONJ": 901,
1942
+ "SCONJ|NumType=Mult": 902,
1943
+ "SCONJ|NumType=Mult|l-mark": 903,
1944
+ "SCONJ|Prefix=Yes": 904,
1945
+ "SCONJ|Prefix=Yes|l-cc": 905,
1946
+ "SCONJ|Prefix=Yes|l-mark": 906,
1947
+ "SCONJ|VerbType=Cop": 907,
1948
+ "SCONJ|VerbType=Cop|l-mark": 908,
1949
+ "SCONJ|l-advmod": 909,
1950
+ "SCONJ|l-case": 910,
1951
+ "SCONJ|l-cc": 911,
1952
+ "SCONJ|l-discourse": 912,
1953
+ "SCONJ|l-mark": 913,
1954
+ "SCONJ|l-nsubj": 914,
1955
+ "SCONJ|l-orphan": 915,
1956
+ "SCONJ|r-advcl": 916,
1957
+ "SCONJ|r-compound": 917,
1958
+ "SCONJ|r-fixed": 918,
1959
+ "SCONJ|r-flat": 919,
1960
+ "SCONJ|r-mark": 920,
1961
+ "SCONJ|r-orphan": 921,
1962
+ "SCONJ|root": 922,
1963
+ "SYM": 923,
1964
+ "SYM|l-dep": 924,
1965
+ "SYM|r-clf": 925,
1966
+ "SYM|r-nmod": 926,
1967
+ "SYM|r-obj": 927,
1968
+ "SYM|r-obl": 928,
1969
+ "SYM|r-xcomp": 929,
1970
+ "VERB": 930,
1971
+ "VERB|Abbr=Yes": 931,
1972
+ "VERB|Abbr=Yes|r-acl": 932,
1973
+ "VERB|Foreign=Yes": 933,
1974
+ "VERB|Foreign=Yes|l-nsubj": 934,
1975
+ "VERB|Foreign=Yes|r-acl": 935,
1976
+ "VERB|Foreign=Yes|r-advcl": 936,
1977
+ "VERB|Foreign=Yes|r-ccomp": 937,
1978
+ "VERB|Foreign=Yes|r-compound": 938,
1979
+ "VERB|Foreign=Yes|r-conj": 939,
1980
+ "VERB|Foreign=Yes|r-flat": 940,
1981
+ "VERB|Foreign=Yes|r-nmod": 941,
1982
+ "VERB|Foreign=Yes|r-xcomp": 942,
1983
+ "VERB|Foreign=Yes|root": 943,
1984
+ "VERB|NounType=Class": 944,
1985
+ "VERB|NounType=Class|r-acl": 945,
1986
+ "VERB|NounType=Class|r-compound": 946,
1987
+ "VERB|PartType=Adj": 947,
1988
+ "VERB|PartType=Adj|r-acl": 948,
1989
+ "VERB|Prefix=Yes": 949,
1990
+ "VERB|Prefix=Yes|l-acl": 950,
1991
+ "VERB|Prefix=Yes|l-nsubj": 951,
1992
+ "VERB|Prefix=Yes|r-acl": 952,
1993
+ "VERB|Prefix=Yes|r-advcl": 953,
1994
+ "VERB|Prefix=Yes|r-ccomp": 954,
1995
+ "VERB|Prefix=Yes|r-compound": 955,
1996
+ "VERB|Prefix=Yes|r-conj": 956,
1997
+ "VERB|Prefix=Yes|r-parataxis": 957,
1998
+ "VERB|Prefix=Yes|root": 958,
1999
+ "VERB|VerbType=Cop": 959,
2000
+ "VERB|VerbType=Cop|l-advmod": 960,
2001
+ "VERB|VerbType=Cop|l-cop": 961,
2002
+ "VERB|VerbType=Cop|r-acl": 962,
2003
+ "VERB|VerbType=Cop|r-advcl": 963,
2004
+ "VERB|VerbType=Cop|r-ccomp": 964,
2005
+ "VERB|VerbType=Cop|r-compound": 965,
2006
+ "VERB|VerbType=Cop|r-parataxis": 966,
2007
+ "VERB|VerbType=Cop|root": 967,
2008
+ "VERB|l-acl": 968,
2009
+ "VERB|l-advcl": 969,
2010
+ "VERB|l-advmod": 970,
2011
+ "VERB|l-aux": 971,
2012
+ "VERB|l-case": 972,
2013
+ "VERB|l-cc": 973,
2014
+ "VERB|l-ccomp": 974,
2015
+ "VERB|l-compound": 975,
2016
+ "VERB|l-conj": 976,
2017
+ "VERB|l-cop": 977,
2018
+ "VERB|l-csubj": 978,
2019
+ "VERB|l-discourse": 979,
2020
+ "VERB|l-dislocated": 980,
2021
+ "VERB|l-mark": 981,
2022
+ "VERB|l-nsubj": 982,
2023
+ "VERB|l-obl": 983,
2024
+ "VERB|l-orphan": 984,
2025
+ "VERB|l-xcomp": 985,
2026
+ "VERB|r-acl": 986,
2027
+ "VERB|r-advcl": 987,
2028
+ "VERB|r-advmod": 988,
2029
+ "VERB|r-appos": 989,
2030
+ "VERB|r-aux": 990,
2031
+ "VERB|r-case": 991,
2032
+ "VERB|r-cc": 992,
2033
+ "VERB|r-ccomp": 993,
2034
+ "VERB|r-clf": 994,
2035
+ "VERB|r-compound": 995,
2036
+ "VERB|r-conj": 996,
2037
+ "VERB|r-dep": 997,
2038
+ "VERB|r-det": 998,
2039
+ "VERB|r-discourse": 999,
2040
+ "VERB|r-fixed": 1000,
2041
+ "VERB|r-flat": 1001,
2042
+ "VERB|r-list": 1002,
2043
+ "VERB|r-mark": 1003,
2044
+ "VERB|r-nmod": 1004,
2045
+ "VERB|r-nsubj": 1005,
2046
+ "VERB|r-obj": 1006,
2047
+ "VERB|r-obl": 1007,
2048
+ "VERB|r-orphan": 1008,
2049
+ "VERB|r-parataxis": 1009,
2050
+ "VERB|r-punct": 1010,
2051
+ "VERB|r-xcomp": 1011,
2052
+ "VERB|root": 1012
2053
+ },
2054
+ "max_position_embeddings": 131072,
2055
+ "mlp_bias": false,
2056
+ "model_type": "llama",
2057
+ "num_attention_heads": 32,
2058
+ "num_hidden_layers": 16,
2059
+ "num_key_value_heads": 8,
2060
+ "pretraining_tp": 1,
2061
+ "rms_norm_eps": 1e-05,
2062
+ "rope_scaling": {
2063
+ "factor": 32.0,
2064
+ "high_freq_factor": 4.0,
2065
+ "low_freq_factor": 1.0,
2066
+ "original_max_position_embeddings": 8192,
2067
+ "rope_type": "llama3"
2068
+ },
2069
+ "rope_theta": 500000.0,
2070
+ "tie_word_embeddings": true,
2071
+ "tokenizer_class": "PreTrainedTokenizerFast",
2072
+ "torch_dtype": "float32",
2073
+ "transformers_version": "4.44.2",
2074
+ "use_cache": true,
2075
+ "vocab_size": 128256
2076
+ }
maker.py ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /usr/bin/python3
2
+ src="meta-llama/Llama-3.2-1B"
3
+ tgt="KoichiYasuoka/Llama-3.2-1B-thai-ud-causal"
4
+ url="https://github.com/KoichiYasuoka/spaCy-Thai"
5
+ import os
6
+ d=os.path.basename(url)
7
+ os.system("test -d "+d+" || git clone --depth=1 "+url)
8
+ os.system("for F in train dev test ; do cp "+d+"/UD_Thai-Corpora/th_tud-ud-$F.conllu $F.conllu ; done")
9
+ class UDCausalDataset(object):
10
+ def __init__(self,conllu,tokenizer,oldtokenizer=None,embeddings=None):
11
+ self.conllu=open(conllu,"r",encoding="utf-8")
12
+ self.tokenizer=tokenizer
13
+ self.oldtokenizer=oldtokenizer if oldtokenizer else tokenizer
14
+ self.embeddings=embeddings
15
+ self.max_tokens=3
16
+ self.seeks=[(0,0)]
17
+ label=set(["SYM"])
18
+ dep=set()
19
+ s=self.conllu.readline()
20
+ while s!="":
21
+ if s=="\n":
22
+ self.seeks.append((self.conllu.tell(),0))
23
+ else:
24
+ w=s.split("\t")
25
+ if len(w)==10:
26
+ if w[0].isdecimal():
27
+ p=w[3] if w[5]=="_" else w[3]+"|"+w[5]
28
+ label.add(p)
29
+ dep.add(p+("|" if w[6]=="0" else "|l-" if int(w[0])<int(w[6]) else "|r-")+w[7])
30
+ self.seeks.append((self.seeks[-1][0],int(w[0])))
31
+ self.max_tokens=max(self.max_tokens,int(w[0])*2+1)
32
+ s=self.conllu.readline()
33
+ lid={}
34
+ for i,l in enumerate(sorted(label)):
35
+ lid[l],lid["B-"+l],lid["I-"+l]=i*3,i*3+1,i*3+2
36
+ for i,d in enumerate(sorted(dep),len(lid)):
37
+ lid[d]=i
38
+ self.label2id=lid
39
+ def __call__(*args):
40
+ lid={l:i for i,l in enumerate(sorted(set(sum([list(t.label2id) for t in args],[]))))}
41
+ for t in args:
42
+ t.label2id=lid
43
+ return lid
44
+ def __del__(self):
45
+ self.conllu.close()
46
+ __len__=lambda self:len(self.seeks)-1
47
+ def __getitem__(self,i):
48
+ s,t=self.seeks[i]
49
+ self.conllu.seek(s)
50
+ form,upos,deps,w=[],[],[],[""]
51
+ while w[0]!="\n":
52
+ w=self.conllu.readline().split("\t")
53
+ if len(w)==10:
54
+ form.append(w[1])
55
+ if w[0].isdecimal():
56
+ upos.append(w[3] if w[5]=="_" else w[3]+"|"+w[5])
57
+ deps.append((int(w[6]),w[7]))
58
+ if t==0:
59
+ v=self.tokenizer(form,add_special_tokens=False)
60
+ i,u=[self.tokenizer.cls_token_id],["SYM"]
61
+ for j,(x,y) in enumerate(zip(v["input_ids"],upos)):
62
+ if x!=[]:
63
+ i+=x
64
+ u+=[y] if len(x)==1 else ["B-"+y]+["I-"+y]*(len(x)-1)
65
+ emb=self.embeddings
66
+ pad=self.tokenizer.pad_token_id
67
+ else:
68
+ import torch
69
+ v=self.oldtokenizer(form,add_special_tokens=False)
70
+ m=[]
71
+ for x in v["input_ids"]:
72
+ if x==[]:
73
+ m.append(self.embeddings[self.tokenizer.unk_token_id,:])
74
+ else:
75
+ m.append(self.embeddings[x,:].sum(axis=0))
76
+ m.append(self.embeddings[self.tokenizer.sep_token_id,:])
77
+ m.append(self.embeddings[self.tokenizer.pad_token_id,:])
78
+ m.append(self.embeddings[self.tokenizer.cls_token_id,:])
79
+ emb=torch.stack(m)
80
+ i,u=list(range(-1,len(upos)+1)),["SYM"]+upos+["SYM"]
81
+ i.append(t-1)
82
+ k,d=deps[t-1]
83
+ u.append(upos[t-1]+"|"+d if k==0 else upos[t-1])
84
+ for j in range(t,len(upos)):
85
+ i.append(j)
86
+ a,b=deps[j]
87
+ u.append(upos[j]+"|r-"+b if a==t else upos[t-1]+"|l-"+d if j+1==k else upos[j])
88
+ pad=-1
89
+ j=self.max_tokens-len(i)
90
+ if j>0:
91
+ ids=i+[pad]*j
92
+ upos=u+["SYM"]*j
93
+ else:
94
+ ids=i[0:self.max_tokens]
95
+ upos=u[0:self.max_tokens]
96
+ return {"inputs_embeds":emb[ids,:],"labels":[self.label2id[p] for p in upos]}
97
+ from transformers import AutoTokenizer,AutoConfig,AutoModelForTokenClassification,DefaultDataCollator,TrainingArguments,Trainer
98
+ from tokenizers.pre_tokenizers import Sequence,Split,Whitespace
99
+ from tokenizers import Regex
100
+ from copy import deepcopy
101
+ otk=AutoTokenizer.from_pretrained(src,cls_token="<|begin_of_text|>",sep_token="<|end_of_text|>",pad_token="<|finetune_right_pad_id|>",unk_token="<|python_tag|>")
102
+ ntk=deepcopy(otk)
103
+ ntk.backend_tokenizer.pre_tokenizer=Sequence([Whitespace(),Split(Regex("[\u0e40-\u0e44]?[\u0e01-\u0e2e][\u0e30-\u0e3a\u0e45\u0e47-\u0e4e]*|."),"isolated"),otk.backend_tokenizer.pre_tokenizer])
104
+ trainDS=UDCausalDataset("train.conllu",ntk,otk)
105
+ devDS=UDCausalDataset("dev.conllu",ntk,otk)
106
+ testDS=UDCausalDataset("test.conllu",ntk,otk)
107
+ lid=trainDS(devDS,testDS)
108
+ cfg=AutoConfig.from_pretrained(src,num_labels=len(lid),label2id=lid,id2label={i:l for l,i in lid.items()},ignore_mismatched_sizes=True)
109
+ mdl=AutoModelForTokenClassification.from_pretrained(src,config=cfg,ignore_mismatched_sizes=True)
110
+ trainDS.embeddings=mdl.get_input_embeddings().weight
111
+ trainDS.max_tokens=min(trainDS.max_tokens,cfg.max_position_embeddings)
112
+ arg=TrainingArguments(num_train_epochs=3,per_device_train_batch_size=16,dataloader_pin_memory=False,output_dir=tgt,overwrite_output_dir=True,save_total_limit=2,learning_rate=5e-05,warmup_ratio=0.1,save_safetensors=False)
113
+ trn=Trainer(args=arg,data_collator=DefaultDataCollator(),model=mdl,train_dataset=trainDS)
114
+ trn.train()
115
+ trn.save_model(tgt)
116
+ otk.save_pretrained(tgt)
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b13a66ff2f2ba35acbe033aae9dd8d94a8a934b09435678299a2c94c6bb93b40
3
+ size 4951610394
special_tokens_map.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<|begin_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<|end_of_text|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "<|finetune_right_pad_id|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "sep_token": {
31
+ "content": "<|end_of_text|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "unk_token": {
38
+ "content": "<|python_tag|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ }
44
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,2065 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|reserved_special_token_0|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|finetune_right_pad_id|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_2|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|eom_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|python_tag|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_3|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_4|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_5|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_6|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_7|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_8|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_9|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_10|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_11|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_12|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_13|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_14|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_15|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_16|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_17|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_18|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_19|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_20|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_21|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_22|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_23|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_24|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_25|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_26|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_27|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_28|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_29|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_30|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_31|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_32|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_33|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_34|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_35|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_36|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_37|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_38|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_39|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_40|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_41|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_42|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_43|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_44|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_45|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_46|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_47|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_48|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_49|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_50|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_51|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_52|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_53|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_54|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_55|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_56|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_57|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_58|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_59|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_60|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_61|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_62|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_63|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_64|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_65|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_66|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_67|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_68|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_69|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_70|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_71|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_72|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_73|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_74|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_75|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_76|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_77|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_78|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_79|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_80|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_81|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_82|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_83|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_84|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_85|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_86|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_87|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_88|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_89|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_90|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_91|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_92|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_93|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_94|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_95|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_96|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_97|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_98|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_99|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_100|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_101|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_102|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_103|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_104|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_105|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_106|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_107|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_108|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_109|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_110|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_111|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_112|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_113|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_114|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_115|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_116|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_117|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_118|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_119|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_120|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_121|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_122|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_123|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_124|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_125|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_126|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_127|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_128|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_129|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_130|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_131|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_132|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_133|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_134|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_135|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_136|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_137|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_138|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_139|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_140|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_141|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_142|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_143|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_144|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_145|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_146|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_147|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_148|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_149|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_150|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_151|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_152|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_153|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_154|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_155|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_156|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_157|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_158|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_159|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_160|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_161|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_162|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_163|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_164|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_165|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_166|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_167|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_168|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_169|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_170|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_171|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_172|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_173|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_174|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_175|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_176|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_177|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_178|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_179|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_180|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_181|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_182|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_183|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_184|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_185|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_186|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_187|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_188|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_189|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_190|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_191|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_192|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_193|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_194|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_195|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_196|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_197|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_198|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_199|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_200|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_201|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_202|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_203|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_204|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_205|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_206|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_207|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_208|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_209|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_210|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_211|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_212|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_213|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_214|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_215|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_216|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_217|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_218|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_219|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_220|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_221|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_222|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_223|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_224|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_225|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_226|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_227|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_228|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_229|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_230|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_231|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_232|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_233|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_234|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_235|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_236|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_237|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_238|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_239|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_240|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_241|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_242|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_243|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_244|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_245|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_246|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_247|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "clean_up_tokenization_spaces": true,
2054
+ "cls_token": "<|begin_of_text|>",
2055
+ "eos_token": "<|end_of_text|>",
2056
+ "model_input_names": [
2057
+ "input_ids",
2058
+ "attention_mask"
2059
+ ],
2060
+ "model_max_length": 131072,
2061
+ "pad_token": "<|finetune_right_pad_id|>",
2062
+ "sep_token": "<|end_of_text|>",
2063
+ "tokenizer_class": "PreTrainedTokenizerFast",
2064
+ "unk_token": "<|python_tag|>"
2065
+ }
ud.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy
2
+ from transformers import TokenClassificationPipeline
3
+
4
+ class BellmanFordTokenClassificationPipeline(TokenClassificationPipeline):
5
+ def __init__(self,**kwargs):
6
+ from copy import deepcopy
7
+ from tokenizers.pre_tokenizers import Sequence,Split,Whitespace
8
+ from tokenizers import Regex
9
+ super().__init__(**kwargs)
10
+ self.oldtokenizer=deepcopy(self.tokenizer)
11
+ self.tokenizer.backend_tokenizer.pre_tokenizer=Sequence([Whitespace(),Split(Regex("[\u0e40-\u0e44]?[\u0e01-\u0e2e][\u0e30-\u0e3a\u0e45\u0e47-\u0e4e]*|."),"isolated"),self.oldtokenizer.backend_tokenizer.pre_tokenizer])
12
+ x=self.model.config.label2id
13
+ y=[k for k in x if k.startswith("B-") or not (k.startswith("I-") or k.endswith("|root") or k.find("|l-")>0 or k.find("|r-")>0)]
14
+ self.transition=numpy.full((len(x),len(x)),numpy.nan)
15
+ for k,v in x.items():
16
+ for j in ["I-"+k[2:]] if k.startswith("B-") else [k]+y if k.startswith("I-") else y:
17
+ self.transition[v,x[j]]=0
18
+ def check_model_type(self,supported_models):
19
+ pass
20
+ def postprocess(self,model_outputs,**kwargs):
21
+ if "logits" not in model_outputs:
22
+ return self.postprocess(model_outputs[0],**kwargs)
23
+ m=model_outputs["logits"][0].numpy()
24
+ e=numpy.exp(m-numpy.max(m,axis=-1,keepdims=True))
25
+ z=e/e.sum(axis=-1,keepdims=True)
26
+ for i in range(m.shape[0]-1,0,-1):
27
+ m[i-1]+=numpy.nanmax(m[i]+self.transition,axis=1)
28
+ k=[numpy.nanargmax(m[0]+self.transition[0])]
29
+ for i in range(1,m.shape[0]):
30
+ k.append(numpy.nanargmax(m[i]+self.transition[k[-1]]))
31
+ w=[{"entity":self.model.config.id2label[j],"start":s,"end":e,"score":z[i,j]} for i,((s,e),j) in enumerate(zip(model_outputs["offset_mapping"][0].tolist(),k)) if s<e]
32
+ if "aggregation_strategy" in kwargs and kwargs["aggregation_strategy"]!="none":
33
+ for i,t in reversed(list(enumerate(w))):
34
+ p=t.pop("entity")
35
+ if p.startswith("I-"):
36
+ w[i-1]["score"]=min(w[i-1]["score"],t["score"])
37
+ w[i-1]["end"]=w.pop(i)["end"]
38
+ elif p.startswith("B-"):
39
+ t["entity_group"]=p[2:]
40
+ else:
41
+ t["entity_group"]=p
42
+ for t in w:
43
+ t["text"]=model_outputs["sentence"][t["start"]:t["end"]]
44
+ return w
45
+
46
+ class UniversalDependenciesCausalPipeline(BellmanFordTokenClassificationPipeline):
47
+ def __init__(self,**kwargs):
48
+ kwargs["aggregation_strategy"]="simple"
49
+ super().__init__(**kwargs)
50
+ x=self.model.config.label2id
51
+ self.root=numpy.full((len(x)),numpy.nan)
52
+ self.left_arc=numpy.full((len(x)),numpy.nan)
53
+ self.right_arc=numpy.full((len(x)),numpy.nan)
54
+ for k,v in x.items():
55
+ if k.endswith("|root"):
56
+ self.root[v]=0
57
+ elif k.find("|l-")>0:
58
+ self.left_arc[v]=0
59
+ elif k.find("|r-")>0:
60
+ self.right_arc[v]=0
61
+ def postprocess(self,model_outputs,**kwargs):
62
+ import torch
63
+ if "logits" not in model_outputs:
64
+ return self.postprocess(model_outputs[0],**kwargs)
65
+ m=model_outputs["logits"][0].numpy()
66
+ for i in range(m.shape[0]-1,0,-1):
67
+ m[i-1]+=numpy.nanmax(m[i]+self.transition,axis=1)
68
+ k=[numpy.nanargmax(m[0]+self.transition[0])]
69
+ for i in range(1,m.shape[0]):
70
+ k.append(numpy.nanargmax(m[i]+self.transition[k[-1]]))
71
+ w=[{"entity":self.model.config.id2label[j],"start":s,"end":e} for i,((s,e),j) in enumerate(zip(model_outputs["offset_mapping"][0].tolist(),k)) if s<e]
72
+ for i,t in reversed(list(enumerate(w))):
73
+ p=t.pop("entity")
74
+ if p.startswith("I-"):
75
+ w[i-1]["end"]=max(w.pop(i)["end"],w[i-1]["end"])
76
+ elif i>0 and w[i-1]["end"]>w[i]["start"]:
77
+ w[i-1]["end"]=max(w.pop(i)["end"],w[i-1]["end"])
78
+ elif p.startswith("B-"):
79
+ t["entity_group"]=p[2:]
80
+ else:
81
+ t["entity_group"]=p
82
+ d=[model_outputs["sentence"][t["start"]:t["end"]] for t in w]
83
+ for i in range(len(d)-1,-1,-1):
84
+ if d[i].startswith(" "):
85
+ j=len(d[i])-len(d[i].lstrip())
86
+ d[i]=d[i].lstrip()
87
+ w[i]["start"]+=j
88
+ if d[i].endswith(" "):
89
+ j=len(d[i])-len(d[i].rstrip())
90
+ d[i]=d[i].rstrip()
91
+ w[i]["end"]-=j
92
+ if d[i].strip()=="":
93
+ d.pop(i)
94
+ w.pop(i)
95
+ v=self.oldtokenizer(d,add_special_tokens=False)
96
+ e=self.model.get_input_embeddings().weight
97
+ m=[]
98
+ for x in v["input_ids"]:
99
+ if x==[]:
100
+ x=[self.tokenizer.unk_token_id]
101
+ m.append(e[x,:].sum(axis=0))
102
+ m.append(e[self.tokenizer.sep_token_id,:])
103
+ m.append(e[self.tokenizer.pad_token_id,:])
104
+ m.append(e[self.tokenizer.cls_token_id,:])
105
+ m=torch.stack(m).to(self.device)
106
+ k=list(range(-1,len(d)+1))
107
+ e=[]
108
+ with torch.no_grad():
109
+ for i in range(len(d)):
110
+ e.append(self.model(inputs_embeds=torch.unsqueeze(m[k+list(range(i,len(d)))+[-2]*i,:],0)).logits[0,-len(d):,:])
111
+ e=torch.stack(e).cpu().numpy()
112
+ for i in range(len(d)):
113
+ for j in range(i):
114
+ e[-j-1,-i-1],e[-i-1,-j-1]=e[-i-1,i-j]+self.left_arc,e[-i-1,i-j]+self.right_arc
115
+ e[-i-1,-i-1]=e[-i-1,0]+self.root
116
+ m,p=numpy.nanmax(e,axis=2),numpy.nanargmax(e,axis=2)
117
+ h=self.chu_liu_edmonds(m)
118
+ z=[i for i,j in enumerate(h) if i==j]
119
+ if len(z)>1:
120
+ k,h=z[numpy.nanargmax(m[z,z])],numpy.nanmin(m)-numpy.nanmax(m)
121
+ m[:,z]+=[[0 if j in z and (i!=j or i==k) else h for i in z] for j in range(m.shape[0])]
122
+ h=self.chu_liu_edmonds(m)
123
+ q=[self.model.config.id2label[p[j,i]].split("|") for i,j in enumerate(h)]
124
+ t=model_outputs["sentence"].replace("\n"," ")
125
+ u="# text = "+t+"\n"
126
+ for i,j in enumerate(d):
127
+ u+="\t".join([str(i+1),j,"_",q[i][0],"_","_" if len(q[i])<3 else "|".join(q[i][1:-1]),str(0 if h[i]==i else h[i]+1),"root" if q[i][-1]=="root" else q[i][-1][2:],"_","_" if i+1<len(d) and w[i]["end"]<w[i+1]["start"] else "SpaceAfter=No"])+"\n"
128
+ return u+"\n"
129
+ def chu_liu_edmonds(self,matrix):
130
+ h=numpy.nanargmax(matrix,axis=0)
131
+ x=[-1 if i==j else j for i,j in enumerate(h)]
132
+ for b in [lambda x,i,j:-1 if i not in x else x[i],lambda x,i,j:-1 if j<0 else x[j]]:
133
+ y=[]
134
+ while x!=y:
135
+ y=list(x)
136
+ for i,j in enumerate(x):
137
+ x[i]=b(x,i,j)
138
+ if max(x)<0:
139
+ return h
140
+ y,x=[i for i,j in enumerate(x) if j==max(x)],[i for i,j in enumerate(x) if j<max(x)]
141
+ z=matrix-numpy.nanmax(matrix,axis=0)
142
+ m=numpy.block([[z[x,:][:,x],numpy.nanmax(z[x,:][:,y],axis=1).reshape(len(x),1)],[numpy.nanmax(z[y,:][:,x],axis=0),numpy.nanmax(z[y,y])]])
143
+ k=[j if i==len(x) else x[j] if j<len(x) else y[numpy.nanargmax(z[y,x[i]])] for i,j in enumerate(self.chu_liu_edmonds(m))]
144
+ h=[j if i in y else k[x.index(i)] for i,j in enumerate(h)]
145
+ i=y[numpy.nanargmax(z[x[k[-1]],y] if k[-1]<len(x) else z[y,y])]
146
+ h[i]=x[k[-1]] if k[-1]<len(x) else i
147
+ return h