browndw commited on
Commit
e0ba4d8
1 Parent(s): 4aadfff

Update spaCy pipeline

Browse files
Files changed (11) hide show
  1. README.md +23 -20
  2. config.cfg +23 -8
  3. en_docusco_spacy-any-py3-none-any.whl +2 -2
  4. meta.json +199 -207
  5. ner/model +2 -2
  6. ner/moves +1 -1
  7. tagger/cfg +2 -4
  8. tagger/model +2 -2
  9. tok2vec/model +2 -2
  10. tokenizer +0 -0
  11. vocab/strings.json +2 -2
README.md CHANGED
@@ -4,6 +4,7 @@ tags:
4
  - token-classification
5
  language:
6
  - en
 
7
  model-index:
8
  - name: en_docusco_spacy
9
  results:
@@ -13,43 +14,45 @@ model-index:
13
  metrics:
14
  - name: NER Precision
15
  type: precision
16
- value: 0.7498327416
17
  - name: NER Recall
18
  type: recall
19
- value: 0.7475641104
20
  - name: NER F Score
21
  type: f_score
22
- value: 0.7486967075
23
  - task:
24
  name: TAG
25
  type: token-classification
26
  metrics:
27
  - name: TAG (XPOS) Accuracy
28
  type: accuracy
29
- value: 0.9249740665
30
  ---
 
 
31
  | Feature | Description |
32
  | --- | --- |
33
  | **Name** | `en_docusco_spacy` |
34
- | **Version** | `0.2` |
35
- | **spaCy** | `>=3.3.0,<3.4.0` |
36
- | **Default Pipeline** | `tok2vec`, `ner`, `tagger` |
37
- | **Components** | `tok2vec`, `ner`, `tagger` |
38
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
39
  | **Sources** | n/a |
40
- | **License** | n/a |
41
- | **Author** | [n/a]() |
42
 
43
  ### Label Scheme
44
 
45
  <details>
46
 
47
- <summary>View label scheme (311 labels for 2 components)</summary>
48
 
49
  | Component | Labels |
50
  | --- | --- |
51
- | **`ner`** | `AcademicTerms`, `AcademicWritingMoves`, `Character`, `Citation`, `CitationAuthority`, `CitationHedged`, `ConfidenceHedged`, `ConfidenceHigh`, `ConfidenceLow`, `Contingent`, `Description`, `Facilitate`, `FirstPerson`, `ForceStressed`, `Future`, `InformationChange`, `InformationChangeNegative`, `InformationChangePositive`, `InformationExposition`, `InformationPlace`, `InformationReportVerbs`, `InformationStates`, `InformationTopics`, `Inquiry`, `Interactive`, `MetadiscourseCohesive`, `MetadiscourseInteractive`, `Narrative`, `Negative`, `Positive`, `PublicTerms`, `Reasoning`, `Responsibility`, `Strategic`, `SyntacticComplexity`, `Uncertainty`, `Updates` |
52
- | **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQGE31`, `DDQGE32`, `DDQGE33`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJ41`, `JJ42`, `JJ43`, `JJ44`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN231`, `NN232`, `NN233`, `NN31`, `NN32`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT131`, `NNT132`, `NNT133`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNQV31`, `PNQV32`, `PNQV33`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RG31`, `RG32`, `RG33`, `RG41`, `RG42`, `RG43`, `RG44`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RL31`, `RL32`, `RL33`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
53
 
54
  </details>
55
 
@@ -57,10 +60,10 @@ model-index:
57
 
58
  | Type | Score |
59
  | --- | --- |
60
- | `ENTS_F` | 74.87 |
61
- | `ENTS_P` | 74.98 |
62
- | `ENTS_R` | 74.76 |
63
- | `TAG_ACC` | 92.50 |
64
- | `TOK2VEC_LOSS` | 9919092.94 |
65
- | `NER_LOSS` | 6441655.97 |
66
- | `TAGGER_LOSS` | 2474292.88 |
 
4
  - token-classification
5
  language:
6
  - en
7
+ license: mit
8
  model-index:
9
  - name: en_docusco_spacy
10
  results:
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.7905337091
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.7900620784
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.7902978234
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9421614376
31
  ---
32
+ English pipeline for part-of-speech and rhetorical tagging.
33
+
34
  | Feature | Description |
35
  | --- | --- |
36
  | **Name** | `en_docusco_spacy` |
37
+ | **Version** | `1.1` |
38
+ | **spaCy** | `>=3.5.0,<3.6.0` |
39
+ | **Default Pipeline** | `tok2vec`, `tagger`, `ner` |
40
+ | **Components** | `tok2vec`, `tagger`, `ner` |
41
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
42
  | **Sources** | n/a |
43
+ | **License** | `MIT` |
44
+ | **Author** | [David Brown](https://docuscope.github.io) |
45
 
46
  ### Label Scheme
47
 
48
  <details>
49
 
50
+ <summary>View label scheme (308 labels for 2 components)</summary>
51
 
52
  | Component | Labels |
53
  | --- | --- |
54
+ | **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQGE31`, `DDQGE32`, `DDQGE33`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJ41`, `JJ42`, `JJ43`, `JJ44`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN231`, `NN232`, `NN233`, `NN31`, `NN32`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT131`, `NNT133`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NNU221`, `NNU222`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNQV31`, `PNQV32`, `PNQV33`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RG41`, `RG42`, `RG43`, `RG44`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RL31`, `RL32`, `RL33`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
55
+ | **`ner`** | `AcademicTerms`, `AcademicWritingMoves`, `Character`, `Citation`, `CitationAuthority`, `CitationHedged`, `ConfidenceHedged`, `ConfidenceHigh`, `ConfidenceLow`, `Contingent`, `Description`, `Facilitate`, `FirstPerson`, `ForceStressed`, `Future`, `InformationChange`, `InformationChangeNegative`, `InformationChangePositive`, `InformationExposition`, `InformationPlace`, `InformationReportVerbs`, `InformationStates`, `InformationTopics`, `Inquiry`, `Interactive`, `MetadiscourseCohesive`, `MetadiscourseInteractive`, `Narrative`, `Negative`, `Positive`, `PublicTerms`, `Reasoning`, `Responsibility`, `Strategic`, `Uncertainty`, `Updates` |
56
 
57
  </details>
58
 
 
60
 
61
  | Type | Score |
62
  | --- | --- |
63
+ | `TAG_ACC` | 94.22 |
64
+ | `ENTS_F` | 79.03 |
65
+ | `ENTS_P` | 79.05 |
66
+ | `ENTS_R` | 79.01 |
67
+ | `TOK2VEC_LOSS` | 17939385.03 |
68
+ | `TAGGER_LOSS` | 2398027.79 |
69
+ | `NER_LOSS` | 5987358.43 |
config.cfg CHANGED
@@ -1,6 +1,6 @@
1
  [paths]
2
- train = "spacy_train.spacy"
3
- dev = "spacy_dev.spacy"
4
  vectors = null
5
  init_tok2vec = null
6
 
@@ -10,7 +10,7 @@ seed = 0
10
 
11
  [nlp]
12
  lang = "en"
13
- pipeline = ["tok2vec","ner","tagger"]
14
  batch_size = 1000
15
  disabled = []
16
  before_creation = null
@@ -66,8 +66,8 @@ factory = "tok2vec"
66
  [components.tok2vec.model.embed]
67
  @architectures = "spacy.MultiHashEmbed.v2"
68
  width = ${components.tok2vec.model.encode.width}
69
- attrs = ["ORTH","SHAPE"]
70
- rows = [5000,2500]
71
  include_static_vectors = false
72
 
73
  [components.tok2vec.model.encode]
@@ -104,11 +104,12 @@ dropout = 0.1
104
  accumulate_gradient = 1
105
  patience = 1600
106
  max_epochs = 0
107
- max_steps = 20000
108
- eval_frequency = 200
109
  frozen_components = []
110
  annotating_components = []
111
  before_to_disk = null
 
112
 
113
  [training.batcher]
114
  @batchers = "spacy.batch_by_words.v1"
@@ -139,11 +140,11 @@ eps = 0.00000001
139
  learn_rate = 0.001
140
 
141
  [training.score_weights]
 
142
  ents_f = 0.5
143
  ents_p = 0.0
144
  ents_r = 0.0
145
  ents_per_type = null
146
- tag_acc = 0.5
147
 
148
  [pretraining]
149
 
@@ -157,4 +158,18 @@ after_init = null
157
 
158
  [initialize.components]
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
  [initialize.tokenizer]
 
1
  [paths]
2
+ train = ""
3
+ dev = ""
4
  vectors = null
5
  init_tok2vec = null
6
 
 
10
 
11
  [nlp]
12
  lang = "en"
13
+ pipeline = ["tok2vec","tagger","ner"]
14
  batch_size = 1000
15
  disabled = []
16
  before_creation = null
 
66
  [components.tok2vec.model.embed]
67
  @architectures = "spacy.MultiHashEmbed.v2"
68
  width = ${components.tok2vec.model.encode.width}
69
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
70
+ rows = [5000,1000,2500,2500]
71
  include_static_vectors = false
72
 
73
  [components.tok2vec.model.encode]
 
104
  accumulate_gradient = 1
105
  patience = 1600
106
  max_epochs = 0
107
+ max_steps = 35000
108
+ eval_frequency = 250
109
  frozen_components = []
110
  annotating_components = []
111
  before_to_disk = null
112
+ before_update = null
113
 
114
  [training.batcher]
115
  @batchers = "spacy.batch_by_words.v1"
 
140
  learn_rate = 0.001
141
 
142
  [training.score_weights]
143
+ tag_acc = 0.5
144
  ents_f = 0.5
145
  ents_p = 0.0
146
  ents_r = 0.0
147
  ents_per_type = null
 
148
 
149
  [pretraining]
150
 
 
158
 
159
  [initialize.components]
160
 
161
+ [initialize.components.ner]
162
+
163
+ [initialize.components.ner.labels]
164
+ @readers = "spacy.read_labels.v1"
165
+ path = "ner.json"
166
+ require = false
167
+
168
+ [initialize.components.tagger]
169
+
170
+ [initialize.components.tagger.labels]
171
+ @readers = "spacy.read_labels.v1"
172
+ path = "tagger.json"
173
+ require = false
174
+
175
  [initialize.tokenizer]
en_docusco_spacy-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:443d5b1a4188a40e0c1de56e0c9551a97b5b05b013fe995a842ba1840a4dafb0
3
- size 6415723
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80c2d7b616cb187c038261e93ca7996e3e5b67adfb6fc1a81d26b8700c5b4c8c
3
+ size 7501766
meta.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy",
4
- "version":"0.2",
5
- "description":"",
6
- "author":"",
7
- "email":"",
8
- "url":"",
9
- "license":"",
10
- "spacy_version":">=3.3.0,<3.4.0",
11
  "spacy_git_version":"Unknown",
12
  "vectors":{
13
  "width":0,
@@ -18,45 +18,6 @@
18
  "labels":{
19
  "tok2vec":[
20
 
21
- ],
22
- "ner":[
23
- "AcademicTerms",
24
- "AcademicWritingMoves",
25
- "Character",
26
- "Citation",
27
- "CitationAuthority",
28
- "CitationHedged",
29
- "ConfidenceHedged",
30
- "ConfidenceHigh",
31
- "ConfidenceLow",
32
- "Contingent",
33
- "Description",
34
- "Facilitate",
35
- "FirstPerson",
36
- "ForceStressed",
37
- "Future",
38
- "InformationChange",
39
- "InformationChangeNegative",
40
- "InformationChangePositive",
41
- "InformationExposition",
42
- "InformationPlace",
43
- "InformationReportVerbs",
44
- "InformationStates",
45
- "InformationTopics",
46
- "Inquiry",
47
- "Interactive",
48
- "MetadiscourseCohesive",
49
- "MetadiscourseInteractive",
50
- "Narrative",
51
- "Negative",
52
- "Positive",
53
- "PublicTerms",
54
- "Reasoning",
55
- "Responsibility",
56
- "Strategic",
57
- "SyntacticComplexity",
58
- "Uncertainty",
59
- "Updates"
60
  ],
61
  "tagger":[
62
  "APPGE",
@@ -172,7 +133,6 @@
172
  "NNO2",
173
  "NNT1",
174
  "NNT131",
175
- "NNT132",
176
  "NNT133",
177
  "NNT2",
178
  "NNU",
@@ -180,6 +140,8 @@
180
  "NNU2",
181
  "NNU21",
182
  "NNU22",
 
 
183
  "NP",
184
  "NP1",
185
  "NP2",
@@ -233,9 +195,6 @@
233
  "RG",
234
  "RG21",
235
  "RG22",
236
- "RG31",
237
- "RG32",
238
- "RG33",
239
  "RG41",
240
  "RG42",
241
  "RG43",
@@ -333,216 +292,249 @@
333
  "ZZ2",
334
  "ZZ221",
335
  "ZZ222"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
336
  ]
337
  },
338
  "pipeline":[
339
  "tok2vec",
340
- "ner",
341
- "tagger"
342
  ],
343
  "components":[
344
  "tok2vec",
345
- "ner",
346
- "tagger"
347
  ],
348
  "disabled":[
349
 
350
  ],
351
  "performance":{
352
- "ents_f":0.7486967075,
353
- "ents_p":0.7498327416,
354
- "ents_r":0.7475641104,
 
355
  "ents_per_type":{
356
- "Narrative":{
357
- "p":0.7509423361,
358
- "r":0.7103704114,
359
- "f":0.7300931537
360
- },
361
- "SyntacticComplexity":{
362
- "p":0.7886246175,
363
- "r":0.8646070539,
364
- "f":0.8248697614
365
- },
366
- "Character":{
367
- "p":0.785967443,
368
- "r":0.7897207251,
369
- "f":0.7878396139
370
  },
371
- "ConfidenceHigh":{
372
- "p":0.7023789438,
373
- "r":0.7254298327,
374
- "f":0.7137183187
375
  },
376
- "FirstPerson":{
377
- "p":0.8278481013,
378
- "r":0.8889566978,
379
- "f":0.8573148383
380
  },
381
- "Negative":{
382
- "p":0.658557511,
383
- "r":0.583953954,
384
- "f":0.6190160386
385
  },
386
- "Description":{
387
- "p":0.6476571898,
388
- "r":0.6760073115,
389
- "f":0.6615286505
390
  },
391
- "InformationExposition":{
392
- "p":0.7949913335,
393
- "r":0.8201482322,
394
- "f":0.8073738649
395
  },
396
  "Strategic":{
397
- "p":0.6668451911,
398
- "r":0.622929668,
399
- "f":0.6441397901
400
- },
401
- "ConfidenceHedged":{
402
- "p":0.8015741286,
403
- "r":0.8440072161,
404
- "f":0.8222435809
405
  },
406
- "Interactive":{
407
- "p":0.7874628485,
408
- "r":0.8065506265,
409
- "f":0.7968924527
410
  },
411
- "InformationReportVerbs":{
412
- "p":0.7481178396,
413
- "r":0.7079138919,
414
- "f":0.7274608101
415
  },
416
- "ForceStressed":{
417
- "p":0.7539183667,
418
- "r":0.7242613439,
419
- "f":0.7387923478
420
  },
421
- "AcademicTerms":{
422
- "p":0.7375272228,
423
- "r":0.7532883043,
424
- "f":0.7453244495
425
  },
426
  "MetadiscourseCohesive":{
427
- "p":0.904598282,
428
- "r":0.9147441302,
429
- "f":0.9096429161
430
  },
431
- "InformationPlace":{
432
- "p":0.8304918033,
433
- "r":0.8436303081,
434
- "f":0.8370095002
435
- },
436
- "Positive":{
437
- "p":0.6864356164,
438
- "r":0.5833436218,
439
- "f":0.6307046557
440
- },
441
- "Citation":{
442
- "p":0.7408448066,
443
- "r":0.7192216767,
444
- "f":0.7298731257
445
  },
446
  "PublicTerms":{
447
- "p":0.7523439934,
448
- "r":0.7137098941,
449
- "f":0.7325178924
450
- },
451
- "CitationAuthority":{
452
- "p":0.7350628931,
453
- "r":0.4909425046,
454
- "f":0.5886982528
455
  },
456
  "Reasoning":{
457
- "p":0.8164492458,
458
- "r":0.7225796753,
459
- "f":0.76665178
 
 
 
 
 
 
 
 
 
 
460
  },
461
  "InformationTopics":{
462
- "p":0.7416131335,
463
- "r":0.7561374001,
464
- "f":0.7488048431
465
  },
466
- "Responsibility":{
467
- "p":0.6843984291,
468
- "r":0.4750929368,
469
- "f":0.5608543008
470
  },
471
- "Updates":{
472
- "p":0.7556217827,
473
- "r":0.6524994151,
474
- "f":0.7002845665
475
  },
476
- "InformationStates":{
477
- "p":0.7582508535,
478
- "r":0.8421234057,
479
- "f":0.7979893297
480
  },
481
  "InformationChange":{
482
- "p":0.6703578762,
483
- "r":0.6338672769,
484
- "f":0.6516020957
485
  },
486
- "Contingent":{
487
- "p":0.7394308387,
488
- "r":0.7160649222,
489
- "f":0.7275603271
490
  },
491
- "MetadiscourseInteractive":{
492
- "p":0.7715463918,
493
- "r":0.6160684886,
494
- "f":0.6850970341
495
  },
496
- "Future":{
497
- "p":0.7021050535,
498
- "r":0.6896034438,
499
- "f":0.6957980981
500
  },
501
- "Inquiry":{
502
- "p":0.5940041831,
503
- "r":0.5108253947,
504
- "f":0.5492836676
505
  },
506
- "Uncertainty":{
507
- "p":0.6979017644,
508
- "r":0.5465919701,
509
- "f":0.6130484868
510
  },
511
- "AcademicWritingMoves":{
512
- "p":0.5667743673,
513
- "r":0.42723767,
514
- "f":0.4872121282
515
  },
516
  "InformationChangeNegative":{
517
- "p":0.6899070385,
518
- "r":0.4249488753,
519
- "f":0.5259427993
520
  },
521
- "Facilitate":{
522
- "p":0.6855355281,
523
- "r":0.5659124447,
524
- "f":0.6200067363
525
  },
526
- "InformationChangePositive":{
527
- "p":0.7209653092,
528
- "r":0.4085470085,
529
- "f":0.5215493726
530
  },
531
- "ConfidenceLow":{
532
- "p":0.751572327,
533
- "r":0.4127806563,
534
- "f":0.5328874025
 
 
 
 
 
535
  },
536
  "CitationHedged":{
537
- "p":0.7021276596,
538
- "r":0.9328621908,
539
- "f":0.8012139605
 
 
 
 
 
 
 
 
 
 
540
  }
541
  },
542
- "tag_acc":0.9249740665,
543
- "tok2vec_loss":99190.9293975094,
544
- "ner_loss":64416.5596667872,
545
- "tagger_loss":24742.9288185574
546
  },
547
  "requirements":[
548
 
 
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy",
4
+ "version":"1.1",
5
+ "description":"English pipeline for part-of-speech and rhetorical tagging.",
6
+ "author":"David Brown",
7
+ "email":"dwb2@andrew.cmu.edu",
8
+ "url":"https://docuscope.github.io",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.5.0,<3.6.0",
11
  "spacy_git_version":"Unknown",
12
  "vectors":{
13
  "width":0,
 
18
  "labels":{
19
  "tok2vec":[
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ],
22
  "tagger":[
23
  "APPGE",
 
133
  "NNO2",
134
  "NNT1",
135
  "NNT131",
 
136
  "NNT133",
137
  "NNT2",
138
  "NNU",
 
140
  "NNU2",
141
  "NNU21",
142
  "NNU22",
143
+ "NNU221",
144
+ "NNU222",
145
  "NP",
146
  "NP1",
147
  "NP2",
 
195
  "RG",
196
  "RG21",
197
  "RG22",
 
 
 
198
  "RG41",
199
  "RG42",
200
  "RG43",
 
292
  "ZZ2",
293
  "ZZ221",
294
  "ZZ222"
295
+ ],
296
+ "ner":[
297
+ "AcademicTerms",
298
+ "AcademicWritingMoves",
299
+ "Character",
300
+ "Citation",
301
+ "CitationAuthority",
302
+ "CitationHedged",
303
+ "ConfidenceHedged",
304
+ "ConfidenceHigh",
305
+ "ConfidenceLow",
306
+ "Contingent",
307
+ "Description",
308
+ "Facilitate",
309
+ "FirstPerson",
310
+ "ForceStressed",
311
+ "Future",
312
+ "InformationChange",
313
+ "InformationChangeNegative",
314
+ "InformationChangePositive",
315
+ "InformationExposition",
316
+ "InformationPlace",
317
+ "InformationReportVerbs",
318
+ "InformationStates",
319
+ "InformationTopics",
320
+ "Inquiry",
321
+ "Interactive",
322
+ "MetadiscourseCohesive",
323
+ "MetadiscourseInteractive",
324
+ "Narrative",
325
+ "Negative",
326
+ "Positive",
327
+ "PublicTerms",
328
+ "Reasoning",
329
+ "Responsibility",
330
+ "Strategic",
331
+ "Uncertainty",
332
+ "Updates"
333
  ]
334
  },
335
  "pipeline":[
336
  "tok2vec",
337
+ "tagger",
338
+ "ner"
339
  ],
340
  "components":[
341
  "tok2vec",
342
+ "tagger",
343
+ "ner"
344
  ],
345
  "disabled":[
346
 
347
  ],
348
  "performance":{
349
+ "tag_acc":0.9421614376,
350
+ "ents_f":0.7902978234,
351
+ "ents_p":0.7905337091,
352
+ "ents_r":0.7900620784,
353
  "ents_per_type":{
354
+ "Contingent":{
355
+ "p":0.8194101982,
356
+ "r":0.7684054754,
357
+ "f":0.7930886354
 
 
 
 
 
 
 
 
 
 
358
  },
359
+ "InformationExposition":{
360
+ "p":0.8408782638,
361
+ "r":0.8514455432,
362
+ "f":0.8461289112
363
  },
364
+ "AcademicTerms":{
365
+ "p":0.7917226257,
366
+ "r":0.8269208748,
367
+ "f":0.8089390481
368
  },
369
+ "ForceStressed":{
370
+ "p":0.7889513438,
371
+ "r":0.7872918493,
372
+ "f":0.7881207229
373
  },
374
+ "Character":{
375
+ "p":0.8428886221,
376
+ "r":0.8535599516,
377
+ "f":0.8481907234
378
  },
379
+ "Narrative":{
380
+ "p":0.7941775374,
381
+ "r":0.7763194725,
382
+ "f":0.7851469732
383
  },
384
  "Strategic":{
385
+ "p":0.7283594711,
386
+ "r":0.7014251342,
387
+ "f":0.7146386076
 
 
 
 
 
388
  },
389
+ "MetadiscourseInteractive":{
390
+ "p":0.8314383172,
391
+ "r":0.6715646883,
392
+ "f":0.7429986664
393
  },
394
+ "Negative":{
395
+ "p":0.7079502122,
396
+ "r":0.6883289496,
397
+ "f":0.6980017167
398
  },
399
+ "Facilitate":{
400
+ "p":0.7118275675,
401
+ "r":0.6680265171,
402
+ "f":0.6892318485
403
  },
404
+ "Interactive":{
405
+ "p":0.8448483369,
406
+ "r":0.8406692585,
407
+ "f":0.8427536169
408
  },
409
  "MetadiscourseCohesive":{
410
+ "p":0.9115467032,
411
+ "r":0.9314739034,
412
+ "f":0.9214025743
413
  },
414
+ "Description":{
415
+ "p":0.718241746,
416
+ "r":0.762914221,
417
+ "f":0.7399043103
 
 
 
 
 
 
 
 
 
 
418
  },
419
  "PublicTerms":{
420
+ "p":0.8216885583,
421
+ "r":0.7849799683,
422
+ "f":0.8029149109
 
 
 
 
 
423
  },
424
  "Reasoning":{
425
+ "p":0.8353065446,
426
+ "r":0.7942001088,
427
+ "f":0.8142348449
428
+ },
429
+ "Positive":{
430
+ "p":0.7350608922,
431
+ "r":0.6682026769,
432
+ "f":0.7000390613
433
+ },
434
+ "Updates":{
435
+ "p":0.7852616757,
436
+ "r":0.7358698043,
437
+ "f":0.759763851
438
  },
439
  "InformationTopics":{
440
+ "p":0.8050484915,
441
+ "r":0.8124731572,
442
+ "f":0.8087437842
443
  },
444
+ "ConfidenceHigh":{
445
+ "p":0.729696785,
446
+ "r":0.783668794,
447
+ "f":0.7557203724
448
  },
449
+ "Citation":{
450
+ "p":0.7883380426,
451
+ "r":0.7925112976,
452
+ "f":0.7904191617
453
  },
454
+ "Uncertainty":{
455
+ "p":0.7364028777,
456
+ "r":0.6471927162,
457
+ "f":0.688921793
458
  },
459
  "InformationChange":{
460
+ "p":0.715477206,
461
+ "r":0.7237081775,
462
+ "f":0.7195691545
463
  },
464
+ "InformationReportVerbs":{
465
+ "p":0.7432557524,
466
+ "r":0.8043649374,
467
+ "f":0.7726038695
468
  },
469
+ "InformationStates":{
470
+ "p":0.7881446908,
471
+ "r":0.8930611381,
472
+ "f":0.8373292341
473
  },
474
+ "ConfidenceHedged":{
475
+ "p":0.8401550734,
476
+ "r":0.8841205084,
477
+ "f":0.8615772774
478
  },
479
+ "FirstPerson":{
480
+ "p":0.8702371032,
481
+ "r":0.8930348259,
482
+ "f":0.8814885862
483
  },
484
+ "Responsibility":{
485
+ "p":0.7245780156,
486
+ "r":0.6123869172,
487
+ "f":0.6637752216
488
  },
489
+ "Inquiry":{
490
+ "p":0.672815534,
491
+ "r":0.5904572565,
492
+ "f":0.6289517471
493
  },
494
  "InformationChangeNegative":{
495
+ "p":0.6999255398,
496
+ "r":0.5277933745,
497
+ "f":0.6017925736
498
  },
499
+ "ConfidenceLow":{
500
+ "p":0.796812749,
501
+ "r":0.4750593824,
502
+ "f":0.5952380952
503
  },
504
+ "InformationPlace":{
505
+ "p":0.8673478574,
506
+ "r":0.8999579655,
507
+ "f":0.8833520526
508
  },
509
+ "Future":{
510
+ "p":0.7551444043,
511
+ "r":0.757468767,
512
+ "f":0.7563047998
513
+ },
514
+ "AcademicWritingMoves":{
515
+ "p":0.7072072072,
516
+ "r":0.4272108844,
517
+ "f":0.5326547922
518
  },
519
  "CitationHedged":{
520
+ "p":0.7910447761,
521
+ "r":0.9098712446,
522
+ "f":0.8463073852
523
+ },
524
+ "CitationAuthority":{
525
+ "p":0.7478653943,
526
+ "r":0.5558044046,
527
+ "f":0.6376873662
528
+ },
529
+ "InformationChangePositive":{
530
+ "p":0.7891513561,
531
+ "r":0.5254879114,
532
+ "f":0.6308795244
533
  }
534
  },
535
+ "tok2vec_loss":179393.8503061496,
536
+ "tagger_loss":23980.277885437,
537
+ "ner_loss":59873.5843254536
 
538
  },
539
  "requirements":[
540
 
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d056afe6ac62c1b4455b4418bb51fb2860d465e64b2bb09fd2ce8087b81bf0d6
3
- size 164952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cc11bb5c791bd15c71cccc36896b4851429cd7a428da01ddbebdee8a30c31b0
3
+ size 163912
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves�
 
1
+ ��moves�
tagger/cfg CHANGED
@@ -113,7 +113,6 @@
113
  "NNO2",
114
  "NNT1",
115
  "NNT131",
116
- "NNT132",
117
  "NNT133",
118
  "NNT2",
119
  "NNU",
@@ -121,6 +120,8 @@
121
  "NNU2",
122
  "NNU21",
123
  "NNU22",
 
 
124
  "NP",
125
  "NP1",
126
  "NP2",
@@ -174,9 +175,6 @@
174
  "RG",
175
  "RG21",
176
  "RG22",
177
- "RG31",
178
- "RG32",
179
- "RG33",
180
  "RG41",
181
  "RG42",
182
  "RG43",
 
113
  "NNO2",
114
  "NNT1",
115
  "NNT131",
 
116
  "NNT133",
117
  "NNT2",
118
  "NNU",
 
120
  "NNU2",
121
  "NNU21",
122
  "NNU22",
123
+ "NNU221",
124
+ "NNU222",
125
  "NP",
126
  "NP1",
127
  "NP2",
 
175
  "RG",
176
  "RG21",
177
  "RG22",
 
 
 
178
  "RG41",
179
  "RG42",
180
  "RG43",
tagger/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:082fdb2ef45d5a5956b0e6b75136ea58e3b385f3b7dd8553f3a4cf2948fa63a9
3
- size 106754
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1983965d07e1d71cc442defe9818cff84461b64be4d239558dabed5a3dffeeee
3
+ size 105978
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:32b3d9da5e1b1c7d218e12e3ca5acb0ce17645c696ac2385cc56951d9492c2b7
3
- size 4443194
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05e2cfc9fcdbf023345f8cd6c80f291f13961dd7b5b233b5793d5b4754a0ac74
3
+ size 6009091
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c0d0b3cc2098ae111e5a0391aeb7f1b783e85f167d1fdb277933af2fd934308a
3
- size 8084734
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c44723e3986900b1fa2c5008767f07ada9f3ac3a58a52c6bd57451fab44a894a
3
+ size 6614948