browndw commited on
Commit
4aadfff
1 Parent(s): c972d78

Update spaCy pipeline

Browse files
Files changed (11) hide show
  1. README.md +23 -13
  2. config.cfg +21 -4
  3. en_docusco_spacy-any-py3-none-any.whl +2 -2
  4. meta.json +434 -154
  5. ner/model +1 -1
  6. ner/moves +1 -1
  7. tagger/cfg +280 -0
  8. tagger/model +3 -0
  9. tok2vec/model +1 -1
  10. tokenizer +0 -0
  11. vocab/strings.json +2 -2
README.md CHANGED
@@ -13,21 +13,28 @@ model-index:
13
  metrics:
14
  - name: NER Precision
15
  type: precision
16
- value: 0.7585141203
17
  - name: NER Recall
18
  type: recall
19
- value: 0.7640098105
20
  - name: NER F Score
21
  type: f_score
22
- value: 0.7612520468
 
 
 
 
 
 
 
23
  ---
24
  | Feature | Description |
25
  | --- | --- |
26
  | **Name** | `en_docusco_spacy` |
27
- | **Version** | `0.1` |
28
- | **spaCy** | `>=3.2.3,<3.3.0` |
29
- | **Default Pipeline** | `tok2vec`, `ner` |
30
- | **Components** | `tok2vec`, `ner` |
31
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
32
  | **Sources** | n/a |
33
  | **License** | n/a |
@@ -37,11 +44,12 @@ model-index:
37
 
38
  <details>
39
 
40
- <summary>View label scheme (37 labels for 1 components)</summary>
41
 
42
  | Component | Labels |
43
  | --- | --- |
44
  | **`ner`** | `AcademicTerms`, `AcademicWritingMoves`, `Character`, `Citation`, `CitationAuthority`, `CitationHedged`, `ConfidenceHedged`, `ConfidenceHigh`, `ConfidenceLow`, `Contingent`, `Description`, `Facilitate`, `FirstPerson`, `ForceStressed`, `Future`, `InformationChange`, `InformationChangeNegative`, `InformationChangePositive`, `InformationExposition`, `InformationPlace`, `InformationReportVerbs`, `InformationStates`, `InformationTopics`, `Inquiry`, `Interactive`, `MetadiscourseCohesive`, `MetadiscourseInteractive`, `Narrative`, `Negative`, `Positive`, `PublicTerms`, `Reasoning`, `Responsibility`, `Strategic`, `SyntacticComplexity`, `Uncertainty`, `Updates` |
 
45
 
46
  </details>
47
 
@@ -49,8 +57,10 @@ model-index:
49
 
50
  | Type | Score |
51
  | --- | --- |
52
- | `ENTS_F` | 76.13 |
53
- | `ENTS_P` | 75.85 |
54
- | `ENTS_R` | 76.40 |
55
- | `TOK2VEC_LOSS` | 6547966.67 |
56
- | `NER_LOSS` | 5460719.26 |
 
 
 
13
  metrics:
14
  - name: NER Precision
15
  type: precision
16
+ value: 0.7498327416
17
  - name: NER Recall
18
  type: recall
19
+ value: 0.7475641104
20
  - name: NER F Score
21
  type: f_score
22
+ value: 0.7486967075
23
+ - task:
24
+ name: TAG
25
+ type: token-classification
26
+ metrics:
27
+ - name: TAG (XPOS) Accuracy
28
+ type: accuracy
29
+ value: 0.9249740665
30
  ---
31
  | Feature | Description |
32
  | --- | --- |
33
  | **Name** | `en_docusco_spacy` |
34
+ | **Version** | `0.2` |
35
+ | **spaCy** | `>=3.3.0,<3.4.0` |
36
+ | **Default Pipeline** | `tok2vec`, `ner`, `tagger` |
37
+ | **Components** | `tok2vec`, `ner`, `tagger` |
38
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
39
  | **Sources** | n/a |
40
  | **License** | n/a |
 
44
 
45
  <details>
46
 
47
+ <summary>View label scheme (311 labels for 2 components)</summary>
48
 
49
  | Component | Labels |
50
  | --- | --- |
51
  | **`ner`** | `AcademicTerms`, `AcademicWritingMoves`, `Character`, `Citation`, `CitationAuthority`, `CitationHedged`, `ConfidenceHedged`, `ConfidenceHigh`, `ConfidenceLow`, `Contingent`, `Description`, `Facilitate`, `FirstPerson`, `ForceStressed`, `Future`, `InformationChange`, `InformationChangeNegative`, `InformationChangePositive`, `InformationExposition`, `InformationPlace`, `InformationReportVerbs`, `InformationStates`, `InformationTopics`, `Inquiry`, `Interactive`, `MetadiscourseCohesive`, `MetadiscourseInteractive`, `Narrative`, `Negative`, `Positive`, `PublicTerms`, `Reasoning`, `Responsibility`, `Strategic`, `SyntacticComplexity`, `Uncertainty`, `Updates` |
52
+ | **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQGE31`, `DDQGE32`, `DDQGE33`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJ41`, `JJ42`, `JJ43`, `JJ44`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN231`, `NN232`, `NN233`, `NN31`, `NN32`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT131`, `NNT132`, `NNT133`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNQV31`, `PNQV32`, `PNQV33`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RG31`, `RG32`, `RG33`, `RG41`, `RG42`, `RG43`, `RG44`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RL31`, `RL32`, `RL33`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
53
 
54
  </details>
55
 
 
57
 
58
  | Type | Score |
59
  | --- | --- |
60
+ | `ENTS_F` | 74.87 |
61
+ | `ENTS_P` | 74.98 |
62
+ | `ENTS_R` | 74.76 |
63
+ | `TAG_ACC` | 92.50 |
64
+ | `TOK2VEC_LOSS` | 9919092.94 |
65
+ | `NER_LOSS` | 6441655.97 |
66
+ | `TAGGER_LOSS` | 2474292.88 |
config.cfg CHANGED
@@ -1,6 +1,6 @@
1
  [paths]
2
- train = "/content/drive/MyDrive/DS Bert/SpacyTrain/spacy_train.spacy"
3
- dev = "/content/drive/MyDrive/DS Bert/SpacyTrain/spacy_dev.spacy"
4
  vectors = null
5
  init_tok2vec = null
6
 
@@ -10,7 +10,7 @@ seed = 0
10
 
11
  [nlp]
12
  lang = "en"
13
- pipeline = ["tok2vec","ner"]
14
  batch_size = 1000
15
  disabled = []
16
  before_creation = null
@@ -41,6 +41,22 @@ nO = null
41
  width = ${components.tok2vec.model.encode.width}
42
  upstream = "*"
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  [components.tok2vec]
45
  factory = "tok2vec"
46
 
@@ -123,10 +139,11 @@ eps = 0.00000001
123
  learn_rate = 0.001
124
 
125
  [training.score_weights]
126
- ents_f = 1.0
127
  ents_p = 0.0
128
  ents_r = 0.0
129
  ents_per_type = null
 
130
 
131
  [pretraining]
132
 
 
1
  [paths]
2
+ train = "spacy_train.spacy"
3
+ dev = "spacy_dev.spacy"
4
  vectors = null
5
  init_tok2vec = null
6
 
 
10
 
11
  [nlp]
12
  lang = "en"
13
+ pipeline = ["tok2vec","ner","tagger"]
14
  batch_size = 1000
15
  disabled = []
16
  before_creation = null
 
41
  width = ${components.tok2vec.model.encode.width}
42
  upstream = "*"
43
 
44
+ [components.tagger]
45
+ factory = "tagger"
46
+ neg_prefix = "!"
47
+ overwrite = false
48
+ scorer = {"@scorers":"spacy.tagger_scorer.v1"}
49
+
50
+ [components.tagger.model]
51
+ @architectures = "spacy.Tagger.v2"
52
+ nO = null
53
+ normalize = false
54
+
55
+ [components.tagger.model.tok2vec]
56
+ @architectures = "spacy.Tok2VecListener.v1"
57
+ width = ${components.tok2vec.model.encode.width}
58
+ upstream = "*"
59
+
60
  [components.tok2vec]
61
  factory = "tok2vec"
62
 
 
139
  learn_rate = 0.001
140
 
141
  [training.score_weights]
142
+ ents_f = 0.5
143
  ents_p = 0.0
144
  ents_r = 0.0
145
  ents_per_type = null
146
+ tag_acc = 0.5
147
 
148
  [pretraining]
149
 
en_docusco_spacy-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:48169adca13989ca04412e7b966c5a86d4a7953208b70ece0ecfe341a290de60
3
- size 6400761
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:443d5b1a4188a40e0c1de56e0c9551a97b5b05b013fe995a842ba1840a4dafb0
3
+ size 6415723
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy",
4
- "version":"0.1",
5
  "description":"",
6
  "author":"",
7
  "email":"",
8
  "url":"",
9
  "license":"",
10
- "spacy_version":">=3.2.3,<3.3.0",
11
- "spacy_git_version":"0fc3dee77",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -57,212 +57,492 @@
57
  "SyntacticComplexity",
58
  "Uncertainty",
59
  "Updates"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ]
61
  },
62
  "pipeline":[
63
  "tok2vec",
64
- "ner"
 
65
  ],
66
  "components":[
67
  "tok2vec",
68
- "ner"
 
69
  ],
70
  "disabled":[
71
 
72
  ],
73
  "performance":{
74
- "ents_f":0.7612520468,
75
- "ents_p":0.7585141203,
76
- "ents_r":0.7640098105,
77
  "ents_per_type":{
78
- "InformationPlace":{
79
- "p":0.8420921046,
80
- "r":0.836695151,
81
- "f":0.8393849528
82
- },
83
- "AcademicTerms":{
84
- "p":0.6929318188,
85
- "r":0.7823984381,
86
- "f":0.7349524219
87
- },
88
- "InformationExposition":{
89
- "p":0.8159063958,
90
- "r":0.828189449,
91
- "f":0.822002039
92
  },
93
- "Citation":{
94
- "p":0.8161176007,
95
- "r":0.7768615692,
96
- "f":0.7960058888
97
  },
98
  "Character":{
99
- "p":0.7873308158,
100
- "r":0.815249523,
101
- "f":0.8010469822
102
  },
103
- "Narrative":{
104
- "p":0.736329178,
105
- "r":0.7496486165,
106
- "f":0.7429292034
 
 
 
 
 
107
  },
108
  "Negative":{
109
- "p":0.682396229,
110
- "r":0.5974259248,
111
- "f":0.637090391
112
  },
113
- "MetadiscourseCohesive":{
114
- "p":0.910339982,
115
- "r":0.9292003665,
116
- "f":0.9196734887
117
  },
118
- "SyntacticComplexity":{
119
- "p":0.8197131727,
120
- "r":0.8420712371,
121
- "f":0.8307417994
122
  },
123
- "ConfidenceHigh":{
124
- "p":0.7570718482,
125
- "r":0.7256583507,
126
- "f":0.7410323323
 
 
 
 
 
127
  },
128
  "Interactive":{
129
- "p":0.8369143072,
130
- "r":0.8221719666,
131
- "f":0.829477638
132
  },
133
- "Positive":{
134
- "p":0.6906661718,
135
- "r":0.6073140386,
136
- "f":0.6463138017
137
  },
138
- "Uncertainty":{
139
- "p":0.7107438017,
140
- "r":0.5676567657,
141
- "f":0.6311926606
142
  },
143
- "InformationStates":{
144
- "p":0.7557757348,
145
- "r":0.8793437898,
146
- "f":0.812890665
147
  },
148
- "Future":{
149
- "p":0.7323179637,
150
- "r":0.6971893932,
151
- "f":0.7143220555
152
  },
153
- "FirstPerson":{
154
- "p":0.8572961373,
155
- "r":0.8938693317,
156
- "f":0.8752008179
157
  },
158
- "ForceStressed":{
159
- "p":0.7580316055,
160
- "r":0.7660465933,
161
- "f":0.7620180243
162
  },
163
- "Description":{
164
- "p":0.6369118798,
165
- "r":0.6990375423,
166
- "f":0.6665301961
167
  },
168
- "Strategic":{
169
- "p":0.718919169,
170
- "r":0.620923913,
171
- "f":0.6663378862
172
  },
173
- "InformationTopics":{
174
- "p":0.7328180061,
175
- "r":0.7668133698,
176
- "f":0.749430365
177
  },
178
- "Updates":{
179
- "p":0.7380083092,
180
- "r":0.6771398868,
181
- "f":0.7062650602
182
  },
183
- "Facilitate":{
184
- "p":0.6842105263,
185
- "r":0.5935093509,
186
- "f":0.635640648
187
  },
188
- "Contingent":{
189
- "p":0.7808806489,
190
- "r":0.7037385129,
191
- "f":0.7403053938
192
  },
193
- "InformationChangeNegative":{
194
- "p":0.6313559322,
195
- "r":0.4694391934,
196
- "f":0.5384893386
 
 
 
 
 
197
  },
198
  "InformationChange":{
199
- "p":0.6768881317,
200
- "r":0.6307545772,
201
- "f":0.65300756
202
  },
203
- "AcademicWritingMoves":{
204
- "p":0.605988024,
205
- "r":0.433342849,
206
- "f":0.5053262317
207
  },
208
- "Inquiry":{
209
- "p":0.6221633888,
210
- "r":0.4875037044,
211
- "f":0.5466629742
212
  },
213
- "Reasoning":{
214
- "p":0.8324523315,
215
- "r":0.7744116614,
216
- "f":0.8023837685
217
  },
218
- "PublicTerms":{
219
- "p":0.761578604,
220
- "r":0.7395091053,
221
- "f":0.7503816181
222
  },
223
- "InformationReportVerbs":{
224
- "p":0.7219466832,
225
- "r":0.7413655897,
226
- "f":0.731527287
227
  },
228
- "MetadiscourseInteractive":{
229
- "p":0.8360918313,
230
- "r":0.6803519062,
231
- "f":0.7502245643
232
  },
233
- "Responsibility":{
234
- "p":0.6865409908,
235
- "r":0.5641210375,
236
- "f":0.6193395294
237
  },
238
- "ConfidenceHedged":{
239
- "p":0.7903159622,
240
- "r":0.852816153,
241
- "f":0.8203773906
242
  },
243
  "InformationChangePositive":{
244
- "p":0.6130252101,
245
- "r":0.4733939001,
246
- "f":0.5342365434
247
- },
248
- "CitationAuthority":{
249
- "p":0.7194244604,
250
- "r":0.498132005,
251
- "f":0.5886681383
252
  },
253
  "ConfidenceLow":{
254
- "p":0.5793650794,
255
- "r":0.2457912458,
256
- "f":0.3451536643
257
  },
258
  "CitationHedged":{
259
- "p":0.5512195122,
260
- "r":0.974137931,
261
- "f":0.7040498442
262
  }
263
  },
264
- "tok2vec_loss":65479.6667127609,
265
- "ner_loss":54607.1925654389
 
 
266
  },
267
  "requirements":[
268
 
 
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy",
4
+ "version":"0.2",
5
  "description":"",
6
  "author":"",
7
  "email":"",
8
  "url":"",
9
  "license":"",
10
+ "spacy_version":">=3.3.0,<3.4.0",
11
+ "spacy_git_version":"Unknown",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
57
  "SyntacticComplexity",
58
  "Uncertainty",
59
  "Updates"
60
+ ],
61
+ "tagger":[
62
+ "APPGE",
63
+ "AT",
64
+ "AT1",
65
+ "BCL21",
66
+ "BCL22",
67
+ "CC",
68
+ "CCB",
69
+ "CS",
70
+ "CS21",
71
+ "CS22",
72
+ "CS31",
73
+ "CS32",
74
+ "CS33",
75
+ "CS41",
76
+ "CS42",
77
+ "CS43",
78
+ "CS44",
79
+ "CSA",
80
+ "CSN",
81
+ "CST",
82
+ "CSW",
83
+ "CSW31",
84
+ "CSW32",
85
+ "CSW33",
86
+ "DA",
87
+ "DA1",
88
+ "DA2",
89
+ "DAR",
90
+ "DAT",
91
+ "DB",
92
+ "DB2",
93
+ "DD",
94
+ "DD1",
95
+ "DD2",
96
+ "DDQ",
97
+ "DDQGE",
98
+ "DDQGE31",
99
+ "DDQGE32",
100
+ "DDQGE33",
101
+ "DDQV",
102
+ "DDQV31",
103
+ "DDQV32",
104
+ "DDQV33",
105
+ "EX",
106
+ "FO",
107
+ "FU",
108
+ "FW",
109
+ "GE",
110
+ "IF",
111
+ "II",
112
+ "II21",
113
+ "II22",
114
+ "II31",
115
+ "II32",
116
+ "II33",
117
+ "II41",
118
+ "II42",
119
+ "II43",
120
+ "II44",
121
+ "IO",
122
+ "IW",
123
+ "JJ",
124
+ "JJ21",
125
+ "JJ22",
126
+ "JJ31",
127
+ "JJ32",
128
+ "JJ33",
129
+ "JJ41",
130
+ "JJ42",
131
+ "JJ43",
132
+ "JJ44",
133
+ "JJR",
134
+ "JJT",
135
+ "JK",
136
+ "MC",
137
+ "MC1",
138
+ "MC2",
139
+ "MC221",
140
+ "MC222",
141
+ "MCMC",
142
+ "MD",
143
+ "MF",
144
+ "ND1",
145
+ "NN",
146
+ "NN1",
147
+ "NN121",
148
+ "NN122",
149
+ "NN131",
150
+ "NN132",
151
+ "NN133",
152
+ "NN141",
153
+ "NN142",
154
+ "NN143",
155
+ "NN144",
156
+ "NN2",
157
+ "NN21",
158
+ "NN22",
159
+ "NN221",
160
+ "NN222",
161
+ "NN231",
162
+ "NN232",
163
+ "NN233",
164
+ "NN31",
165
+ "NN32",
166
+ "NN33",
167
+ "NNA",
168
+ "NNB",
169
+ "NNL1",
170
+ "NNL2",
171
+ "NNO",
172
+ "NNO2",
173
+ "NNT1",
174
+ "NNT131",
175
+ "NNT132",
176
+ "NNT133",
177
+ "NNT2",
178
+ "NNU",
179
+ "NNU1",
180
+ "NNU2",
181
+ "NNU21",
182
+ "NNU22",
183
+ "NP",
184
+ "NP1",
185
+ "NP2",
186
+ "NPD1",
187
+ "NPD2",
188
+ "NPM1",
189
+ "NPM2",
190
+ "PN",
191
+ "PN1",
192
+ "PN121",
193
+ "PN122",
194
+ "PN21",
195
+ "PN22",
196
+ "PNQO",
197
+ "PNQS",
198
+ "PNQS31",
199
+ "PNQS32",
200
+ "PNQS33",
201
+ "PNQV",
202
+ "PNQV31",
203
+ "PNQV32",
204
+ "PNQV33",
205
+ "PNX1",
206
+ "PPGE",
207
+ "PPH1",
208
+ "PPHO1",
209
+ "PPHO2",
210
+ "PPHS1",
211
+ "PPHS2",
212
+ "PPIO1",
213
+ "PPIO2",
214
+ "PPIS1",
215
+ "PPIS2",
216
+ "PPX1",
217
+ "PPX121",
218
+ "PPX122",
219
+ "PPX2",
220
+ "PPX221",
221
+ "PPX222",
222
+ "PPY",
223
+ "RA",
224
+ "RA21",
225
+ "RA22",
226
+ "REX",
227
+ "REX21",
228
+ "REX22",
229
+ "REX41",
230
+ "REX42",
231
+ "REX43",
232
+ "REX44",
233
+ "RG",
234
+ "RG21",
235
+ "RG22",
236
+ "RG31",
237
+ "RG32",
238
+ "RG33",
239
+ "RG41",
240
+ "RG42",
241
+ "RG43",
242
+ "RG44",
243
+ "RGQ",
244
+ "RGQV",
245
+ "RGQV31",
246
+ "RGQV32",
247
+ "RGQV33",
248
+ "RGR",
249
+ "RGT",
250
+ "RL",
251
+ "RL21",
252
+ "RL22",
253
+ "RL31",
254
+ "RL32",
255
+ "RL33",
256
+ "RP",
257
+ "RPK",
258
+ "RR",
259
+ "RR21",
260
+ "RR22",
261
+ "RR31",
262
+ "RR32",
263
+ "RR33",
264
+ "RR41",
265
+ "RR42",
266
+ "RR43",
267
+ "RR44",
268
+ "RR51",
269
+ "RR52",
270
+ "RR53",
271
+ "RR54",
272
+ "RR55",
273
+ "RRQ",
274
+ "RRQV",
275
+ "RRQV31",
276
+ "RRQV32",
277
+ "RRQV33",
278
+ "RRR",
279
+ "RRT",
280
+ "RT",
281
+ "RT21",
282
+ "RT22",
283
+ "RT31",
284
+ "RT32",
285
+ "RT33",
286
+ "RT41",
287
+ "RT42",
288
+ "RT43",
289
+ "RT44",
290
+ "TO",
291
+ "UH",
292
+ "UH21",
293
+ "UH22",
294
+ "UH31",
295
+ "UH32",
296
+ "UH33",
297
+ "VB0",
298
+ "VBDR",
299
+ "VBDZ",
300
+ "VBG",
301
+ "VBI",
302
+ "VBM",
303
+ "VBN",
304
+ "VBR",
305
+ "VBZ",
306
+ "VD0",
307
+ "VDD",
308
+ "VDG",
309
+ "VDI",
310
+ "VDN",
311
+ "VDZ",
312
+ "VH0",
313
+ "VHD",
314
+ "VHG",
315
+ "VHI",
316
+ "VHN",
317
+ "VHZ",
318
+ "VM",
319
+ "VM21",
320
+ "VM22",
321
+ "VMK",
322
+ "VV0",
323
+ "VVD",
324
+ "VVG",
325
+ "VVGK",
326
+ "VVI",
327
+ "VVN",
328
+ "VVNK",
329
+ "VVZ",
330
+ "XX",
331
+ "Y",
332
+ "ZZ1",
333
+ "ZZ2",
334
+ "ZZ221",
335
+ "ZZ222"
336
  ]
337
  },
338
  "pipeline":[
339
  "tok2vec",
340
+ "ner",
341
+ "tagger"
342
  ],
343
  "components":[
344
  "tok2vec",
345
+ "ner",
346
+ "tagger"
347
  ],
348
  "disabled":[
349
 
350
  ],
351
  "performance":{
352
+ "ents_f":0.7486967075,
353
+ "ents_p":0.7498327416,
354
+ "ents_r":0.7475641104,
355
  "ents_per_type":{
356
+ "Narrative":{
357
+ "p":0.7509423361,
358
+ "r":0.7103704114,
359
+ "f":0.7300931537
 
 
 
 
 
 
 
 
 
 
360
  },
361
+ "SyntacticComplexity":{
362
+ "p":0.7886246175,
363
+ "r":0.8646070539,
364
+ "f":0.8248697614
365
  },
366
  "Character":{
367
+ "p":0.785967443,
368
+ "r":0.7897207251,
369
+ "f":0.7878396139
370
  },
371
+ "ConfidenceHigh":{
372
+ "p":0.7023789438,
373
+ "r":0.7254298327,
374
+ "f":0.7137183187
375
+ },
376
+ "FirstPerson":{
377
+ "p":0.8278481013,
378
+ "r":0.8889566978,
379
+ "f":0.8573148383
380
  },
381
  "Negative":{
382
+ "p":0.658557511,
383
+ "r":0.583953954,
384
+ "f":0.6190160386
385
  },
386
+ "Description":{
387
+ "p":0.6476571898,
388
+ "r":0.6760073115,
389
+ "f":0.6615286505
390
  },
391
+ "InformationExposition":{
392
+ "p":0.7949913335,
393
+ "r":0.8201482322,
394
+ "f":0.8073738649
395
  },
396
+ "Strategic":{
397
+ "p":0.6668451911,
398
+ "r":0.622929668,
399
+ "f":0.6441397901
400
+ },
401
+ "ConfidenceHedged":{
402
+ "p":0.8015741286,
403
+ "r":0.8440072161,
404
+ "f":0.8222435809
405
  },
406
  "Interactive":{
407
+ "p":0.7874628485,
408
+ "r":0.8065506265,
409
+ "f":0.7968924527
410
  },
411
+ "InformationReportVerbs":{
412
+ "p":0.7481178396,
413
+ "r":0.7079138919,
414
+ "f":0.7274608101
415
  },
416
+ "ForceStressed":{
417
+ "p":0.7539183667,
418
+ "r":0.7242613439,
419
+ "f":0.7387923478
420
  },
421
+ "AcademicTerms":{
422
+ "p":0.7375272228,
423
+ "r":0.7532883043,
424
+ "f":0.7453244495
425
  },
426
+ "MetadiscourseCohesive":{
427
+ "p":0.904598282,
428
+ "r":0.9147441302,
429
+ "f":0.9096429161
430
  },
431
+ "InformationPlace":{
432
+ "p":0.8304918033,
433
+ "r":0.8436303081,
434
+ "f":0.8370095002
435
  },
436
+ "Positive":{
437
+ "p":0.6864356164,
438
+ "r":0.5833436218,
439
+ "f":0.6307046557
440
  },
441
+ "Citation":{
442
+ "p":0.7408448066,
443
+ "r":0.7192216767,
444
+ "f":0.7298731257
445
  },
446
+ "PublicTerms":{
447
+ "p":0.7523439934,
448
+ "r":0.7137098941,
449
+ "f":0.7325178924
450
  },
451
+ "CitationAuthority":{
452
+ "p":0.7350628931,
453
+ "r":0.4909425046,
454
+ "f":0.5886982528
455
  },
456
+ "Reasoning":{
457
+ "p":0.8164492458,
458
+ "r":0.7225796753,
459
+ "f":0.76665178
460
  },
461
+ "InformationTopics":{
462
+ "p":0.7416131335,
463
+ "r":0.7561374001,
464
+ "f":0.7488048431
465
  },
466
+ "Responsibility":{
467
+ "p":0.6843984291,
468
+ "r":0.4750929368,
469
+ "f":0.5608543008
470
  },
471
+ "Updates":{
472
+ "p":0.7556217827,
473
+ "r":0.6524994151,
474
+ "f":0.7002845665
475
+ },
476
+ "InformationStates":{
477
+ "p":0.7582508535,
478
+ "r":0.8421234057,
479
+ "f":0.7979893297
480
  },
481
  "InformationChange":{
482
+ "p":0.6703578762,
483
+ "r":0.6338672769,
484
+ "f":0.6516020957
485
  },
486
+ "Contingent":{
487
+ "p":0.7394308387,
488
+ "r":0.7160649222,
489
+ "f":0.7275603271
490
  },
491
+ "MetadiscourseInteractive":{
492
+ "p":0.7715463918,
493
+ "r":0.6160684886,
494
+ "f":0.6850970341
495
  },
496
+ "Future":{
497
+ "p":0.7021050535,
498
+ "r":0.6896034438,
499
+ "f":0.6957980981
500
  },
501
+ "Inquiry":{
502
+ "p":0.5940041831,
503
+ "r":0.5108253947,
504
+ "f":0.5492836676
505
  },
506
+ "Uncertainty":{
507
+ "p":0.6979017644,
508
+ "r":0.5465919701,
509
+ "f":0.6130484868
510
  },
511
+ "AcademicWritingMoves":{
512
+ "p":0.5667743673,
513
+ "r":0.42723767,
514
+ "f":0.4872121282
515
  },
516
+ "InformationChangeNegative":{
517
+ "p":0.6899070385,
518
+ "r":0.4249488753,
519
+ "f":0.5259427993
520
  },
521
+ "Facilitate":{
522
+ "p":0.6855355281,
523
+ "r":0.5659124447,
524
+ "f":0.6200067363
525
  },
526
  "InformationChangePositive":{
527
+ "p":0.7209653092,
528
+ "r":0.4085470085,
529
+ "f":0.5215493726
 
 
 
 
 
530
  },
531
  "ConfidenceLow":{
532
+ "p":0.751572327,
533
+ "r":0.4127806563,
534
+ "f":0.5328874025
535
  },
536
  "CitationHedged":{
537
+ "p":0.7021276596,
538
+ "r":0.9328621908,
539
+ "f":0.8012139605
540
  }
541
  },
542
+ "tag_acc":0.9249740665,
543
+ "tok2vec_loss":99190.9293975094,
544
+ "ner_loss":64416.5596667872,
545
+ "tagger_loss":24742.9288185574
546
  },
547
  "requirements":[
548
 
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2cbbfc404b90a8701852b14548684aab5c7fc863db9da7f44f2fdfd22472b42a
3
  size 164952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d056afe6ac62c1b4455b4418bb51fb2860d465e64b2bb09fd2ce8087b81bf0d6
3
  size 164952
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves�
 
1
+ ��moves�
tagger/cfg ADDED
@@ -0,0 +1,280 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "APPGE",
4
+ "AT",
5
+ "AT1",
6
+ "BCL21",
7
+ "BCL22",
8
+ "CC",
9
+ "CCB",
10
+ "CS",
11
+ "CS21",
12
+ "CS22",
13
+ "CS31",
14
+ "CS32",
15
+ "CS33",
16
+ "CS41",
17
+ "CS42",
18
+ "CS43",
19
+ "CS44",
20
+ "CSA",
21
+ "CSN",
22
+ "CST",
23
+ "CSW",
24
+ "CSW31",
25
+ "CSW32",
26
+ "CSW33",
27
+ "DA",
28
+ "DA1",
29
+ "DA2",
30
+ "DAR",
31
+ "DAT",
32
+ "DB",
33
+ "DB2",
34
+ "DD",
35
+ "DD1",
36
+ "DD2",
37
+ "DDQ",
38
+ "DDQGE",
39
+ "DDQGE31",
40
+ "DDQGE32",
41
+ "DDQGE33",
42
+ "DDQV",
43
+ "DDQV31",
44
+ "DDQV32",
45
+ "DDQV33",
46
+ "EX",
47
+ "FO",
48
+ "FU",
49
+ "FW",
50
+ "GE",
51
+ "IF",
52
+ "II",
53
+ "II21",
54
+ "II22",
55
+ "II31",
56
+ "II32",
57
+ "II33",
58
+ "II41",
59
+ "II42",
60
+ "II43",
61
+ "II44",
62
+ "IO",
63
+ "IW",
64
+ "JJ",
65
+ "JJ21",
66
+ "JJ22",
67
+ "JJ31",
68
+ "JJ32",
69
+ "JJ33",
70
+ "JJ41",
71
+ "JJ42",
72
+ "JJ43",
73
+ "JJ44",
74
+ "JJR",
75
+ "JJT",
76
+ "JK",
77
+ "MC",
78
+ "MC1",
79
+ "MC2",
80
+ "MC221",
81
+ "MC222",
82
+ "MCMC",
83
+ "MD",
84
+ "MF",
85
+ "ND1",
86
+ "NN",
87
+ "NN1",
88
+ "NN121",
89
+ "NN122",
90
+ "NN131",
91
+ "NN132",
92
+ "NN133",
93
+ "NN141",
94
+ "NN142",
95
+ "NN143",
96
+ "NN144",
97
+ "NN2",
98
+ "NN21",
99
+ "NN22",
100
+ "NN221",
101
+ "NN222",
102
+ "NN231",
103
+ "NN232",
104
+ "NN233",
105
+ "NN31",
106
+ "NN32",
107
+ "NN33",
108
+ "NNA",
109
+ "NNB",
110
+ "NNL1",
111
+ "NNL2",
112
+ "NNO",
113
+ "NNO2",
114
+ "NNT1",
115
+ "NNT131",
116
+ "NNT132",
117
+ "NNT133",
118
+ "NNT2",
119
+ "NNU",
120
+ "NNU1",
121
+ "NNU2",
122
+ "NNU21",
123
+ "NNU22",
124
+ "NP",
125
+ "NP1",
126
+ "NP2",
127
+ "NPD1",
128
+ "NPD2",
129
+ "NPM1",
130
+ "NPM2",
131
+ "PN",
132
+ "PN1",
133
+ "PN121",
134
+ "PN122",
135
+ "PN21",
136
+ "PN22",
137
+ "PNQO",
138
+ "PNQS",
139
+ "PNQS31",
140
+ "PNQS32",
141
+ "PNQS33",
142
+ "PNQV",
143
+ "PNQV31",
144
+ "PNQV32",
145
+ "PNQV33",
146
+ "PNX1",
147
+ "PPGE",
148
+ "PPH1",
149
+ "PPHO1",
150
+ "PPHO2",
151
+ "PPHS1",
152
+ "PPHS2",
153
+ "PPIO1",
154
+ "PPIO2",
155
+ "PPIS1",
156
+ "PPIS2",
157
+ "PPX1",
158
+ "PPX121",
159
+ "PPX122",
160
+ "PPX2",
161
+ "PPX221",
162
+ "PPX222",
163
+ "PPY",
164
+ "RA",
165
+ "RA21",
166
+ "RA22",
167
+ "REX",
168
+ "REX21",
169
+ "REX22",
170
+ "REX41",
171
+ "REX42",
172
+ "REX43",
173
+ "REX44",
174
+ "RG",
175
+ "RG21",
176
+ "RG22",
177
+ "RG31",
178
+ "RG32",
179
+ "RG33",
180
+ "RG41",
181
+ "RG42",
182
+ "RG43",
183
+ "RG44",
184
+ "RGQ",
185
+ "RGQV",
186
+ "RGQV31",
187
+ "RGQV32",
188
+ "RGQV33",
189
+ "RGR",
190
+ "RGT",
191
+ "RL",
192
+ "RL21",
193
+ "RL22",
194
+ "RL31",
195
+ "RL32",
196
+ "RL33",
197
+ "RP",
198
+ "RPK",
199
+ "RR",
200
+ "RR21",
201
+ "RR22",
202
+ "RR31",
203
+ "RR32",
204
+ "RR33",
205
+ "RR41",
206
+ "RR42",
207
+ "RR43",
208
+ "RR44",
209
+ "RR51",
210
+ "RR52",
211
+ "RR53",
212
+ "RR54",
213
+ "RR55",
214
+ "RRQ",
215
+ "RRQV",
216
+ "RRQV31",
217
+ "RRQV32",
218
+ "RRQV33",
219
+ "RRR",
220
+ "RRT",
221
+ "RT",
222
+ "RT21",
223
+ "RT22",
224
+ "RT31",
225
+ "RT32",
226
+ "RT33",
227
+ "RT41",
228
+ "RT42",
229
+ "RT43",
230
+ "RT44",
231
+ "TO",
232
+ "UH",
233
+ "UH21",
234
+ "UH22",
235
+ "UH31",
236
+ "UH32",
237
+ "UH33",
238
+ "VB0",
239
+ "VBDR",
240
+ "VBDZ",
241
+ "VBG",
242
+ "VBI",
243
+ "VBM",
244
+ "VBN",
245
+ "VBR",
246
+ "VBZ",
247
+ "VD0",
248
+ "VDD",
249
+ "VDG",
250
+ "VDI",
251
+ "VDN",
252
+ "VDZ",
253
+ "VH0",
254
+ "VHD",
255
+ "VHG",
256
+ "VHI",
257
+ "VHN",
258
+ "VHZ",
259
+ "VM",
260
+ "VM21",
261
+ "VM22",
262
+ "VMK",
263
+ "VV0",
264
+ "VVD",
265
+ "VVG",
266
+ "VVGK",
267
+ "VVI",
268
+ "VVN",
269
+ "VVNK",
270
+ "VVZ",
271
+ "XX",
272
+ "Y",
273
+ "ZZ1",
274
+ "ZZ2",
275
+ "ZZ221",
276
+ "ZZ222"
277
+ ],
278
+ "neg_prefix":"!",
279
+ "overwrite":false
280
+ }
tagger/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:082fdb2ef45d5a5956b0e6b75136ea58e3b385f3b7dd8553f3a4cf2948fa63a9
3
+ size 106754
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2f5e1e703952b90480009aa0badc8bf3886e4d5047aa11cf10b137180d9bdcfa
3
  size 4443194
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32b3d9da5e1b1c7d218e12e3ca5acb0ce17645c696ac2385cc56951d9492c2b7
3
  size 4443194
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85aacb275e691373422915d4c53f7317bdbc6a52ae411b9336858c5f2c663392
3
- size 8511009
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0d0b3cc2098ae111e5a0391aeb7f1b783e85f167d1fdb277933af2fd934308a
3
+ size 8084734