Update spaCy pipeline

Browse files

Files changed (12) hide show

.gitattributes +1 -0
README.md +19 -19
config.cfg +16 -13
en_docusco_spacy_cd_trf-any-py3-none-any.whl +3 -0
meta.json +110 -90
ner/model +0 -0
ner/moves +1 -1
tagger/cfg +24 -3
tagger/model +0 -0
tokenizer +0 -0
transformer/model +2 -2
vocab/strings.json +0 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 en_docusco_spacy_fc_trf-any-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text
 transformer/model filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 en_docusco_spacy_fc_trf-any-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text
 transformer/model filter=lfs diff=lfs merge=lfs -text
+en_docusco_spacy_cd_trf-any-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ language:
 - en
 license: mit
 model-index:
-- name: en_docusco_spacy_fc_trf
   results:
   - task:
       name: NER
@@ -14,44 +14,44 @@ model-index:
     metrics:
     - name: NER Precision
       type: precision
-      value: 0.889028963
     - name: NER Recall
       type: recall
-      value: 0.8833963688
     - name: NER F Score
       type: f_score
-      value: 0.886203716
   - task:
       name: TAG
       type: token-classification
     metrics:
     - name: TAG (XPOS) Accuracy
       type: accuracy
-      value: 0.9838746739
 ---
-English pipeline for part-of-speech and rhetorical tagging.
 | Feature | Description |
 | --- | --- |
-| **Name** | `en_docusco_spacy_fc_trf` |
-| **Version** | `1.1` |
-| **spaCy** | `>=3.4.3,<3.5.0` |
 | **Default Pipeline** | `transformer`, `tagger`, `ner` |
 | **Components** | `transformer`, `tagger`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
 | **License** | `MIT` |
-| **Author** | [David Brown](https://browndw.github.io/docuscope-docs/) |
 ### Label Scheme
 <details>
-<summary>View label scheme (269 labels for 2 components)</summary>
 | Component | Labels |
 | --- | --- |
-| **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN231`, `NN232`, `NN233`, `NN31`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
 | **`ner`** | `ActorsAbstractions`, `ActorsFirstPerson`, `ActorsPeople`, `ActorsPublicEntities`, `CitationAuthority`, `CitationControversy`, `CitationNeutral`, `ConfidenceHedged`, `ConfidenceHigh`, `OrganizationNarrative`, `OrganizationReasoning`, `PlanningFuture`, `PlanningStrategy`, `SentimentNegative`, `SentimentPositive`, `SignpostingAcademicWritingMoves`, `SignpostingMetadiscourse`, `StanceEmphatic`, `StanceModerated` |
 </details>
@@ -60,10 +60,10 @@ English pipeline for part-of-speech and rhetorical tagging.
 | Type | Score |
 | --- | --- |
-| `TAG_ACC` | 98.39 |
-| `ENTS_F` | 88.62 |
-| `ENTS_P` | 88.90 |
-| `ENTS_R` | 88.34 |
-| `TRANSFORMER_LOSS` | 2319800.36 |
-| `TAGGER_LOSS` | 669777.78 |
-| `NER_LOSS` | 2048423.35 |

 - en
 license: mit
 model-index:
+- name: en_docusco_spacy_cd_trf
   results:
   - task:
       name: NER
     metrics:
     - name: NER Precision
       type: precision
+      value: 0.8975978922
     - name: NER Recall
       type: recall
+      value: 0.8996163997
     - name: NER F Score
       type: f_score
+      value: 0.8986060124
   - task:
       name: TAG
       type: token-classification
     metrics:
     - name: TAG (XPOS) Accuracy
       type: accuracy
+      value: 0.9860324848
 ---
+English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.
 | Feature | Description |
 | --- | --- |
+| **Name** | `en_docusco_spacy_cd_trf` |
+| **Version** | `1.3` |
+| **spaCy** | `>=3.7.4,<3.8.0` |
 | **Default Pipeline** | `transformer`, `tagger`, `ner` |
 | **Components** | `transformer`, `tagger`, `ner` |
 | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 | **Sources** | n/a |
 | **License** | `MIT` |
+| **Author** | [David Brown](https://docuscope.github.io) |
 ### Label Scheme
 <details>
+<summary>View label scheme (289 labels for 2 components)</summary>
 | Component | Labels |
 | --- | --- |
+| **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQGE31`, `DDQGE32`, `DDQGE33`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJ41`, `JJ42`, `JJ43`, `JJ44`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN31`, `NN32`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT131`, `NNT132`, `NNT133`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NNU221`, `NNU222`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNQV31`, `PNQV32`, `PNQV33`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RG41`, `RG42`, `RG43`, `RG44`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RL31`, `RL32`, `RL33`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
 | **`ner`** | `ActorsAbstractions`, `ActorsFirstPerson`, `ActorsPeople`, `ActorsPublicEntities`, `CitationAuthority`, `CitationControversy`, `CitationNeutral`, `ConfidenceHedged`, `ConfidenceHigh`, `OrganizationNarrative`, `OrganizationReasoning`, `PlanningFuture`, `PlanningStrategy`, `SentimentNegative`, `SentimentPositive`, `SignpostingAcademicWritingMoves`, `SignpostingMetadiscourse`, `StanceEmphatic`, `StanceModerated` |
 </details>
 | Type | Score |
 | --- | --- |
+| `TAG_ACC` | 98.60 |
+| `ENTS_F` | 89.86 |
+| `ENTS_P` | 89.76 |
+| `ENTS_R` | 89.96 |
+| `TRANSFORMER_LOSS` | 4671131.21 |
+| `TAGGER_LOSS` | 1405830.04 |
+| `NER_LOSS` | 4168254.47 |

config.cfg CHANGED Viewed

@@ -1,6 +1,6 @@
 [paths]
-train = "/content/drive/MyDrive/DS Bert/SpacyTrain/spacy_train_cd.spacy"
-dev = "/content/drive/MyDrive/DS Bert/SpacyTrain/spacy_test_cd.spacy"
 vectors = null
 init_tok2vec = null
@@ -11,12 +11,13 @@ seed = 0
 [nlp]
 lang = "en"
 pipeline = ["transformer","tagger","ner"]
-batch_size = 128
 disabled = []
 before_creation = null
 after_creation = null
 after_pipeline_creation = null
 tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
 [components]
@@ -44,6 +45,7 @@ upstream = "*"
 [components.tagger]
 factory = "tagger"
 neg_prefix = "!"
 overwrite = false
 scorer = {"@scorers":"spacy.tagger_scorer.v1"}
@@ -106,13 +108,14 @@ train_corpus = "corpora.train"
 seed = ${system.seed}
 gpu_allocator = ${system.gpu_allocator}
 dropout = 0.1
-patience = 1600
-max_epochs = 0
-max_steps = 20000
-eval_frequency = 200
 frozen_components = []
 annotating_components = []
 before_to_disk = null
 [training.batcher]
 @batchers = "spacy.batch_by_padded.v1"
@@ -137,13 +140,13 @@ eps = 0.00000001
 [training.optimizer.learn_rate]
 @schedules = "warmup_linear.v1"
-warmup_steps = 250
-total_steps = 20000
 initial_rate = 0.00005
 [training.score_weights]
-tag_acc = 0.5
-ents_f = 0.5
 ents_p = 0.0
 ents_r = 0.0
 ents_per_type = null
@@ -164,14 +167,14 @@ after_init = null
 [initialize.components.ner.labels]
 @readers = "spacy.read_labels.v1"
-path = "\"/content/drive/MyDrive/DS Bert/SpacyTrain/ner-sample.json"
 require = false
 [initialize.components.tagger]
 [initialize.components.tagger.labels]
 @readers = "spacy.read_labels.v1"
-path = "/content/drive/MyDrive/DS Bert/SpacyTrain/tagger-sample.json"
 require = false
 [initialize.tokenizer]

 [paths]
+train = "spacy_train_05.spacy"
+dev = "spacy_dev_05.spacy"
 vectors = null
 init_tok2vec = null
 [nlp]
 lang = "en"
 pipeline = ["transformer","tagger","ner"]
+batch_size = 32
 disabled = []
 before_creation = null
 after_creation = null
 after_pipeline_creation = null
 tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
+vectors = {"@vectors":"spacy.Vectors.v1"}
 [components]
 [components.tagger]
 factory = "tagger"
+label_smoothing = 0.0
 neg_prefix = "!"
 overwrite = false
 scorer = {"@scorers":"spacy.tagger_scorer.v1"}
 seed = ${system.seed}
 gpu_allocator = ${system.gpu_allocator}
 dropout = 0.1
+patience = 20000
+max_epochs = -1
+max_steps = 30000
+eval_frequency = 500
 frozen_components = []
 annotating_components = []
 before_to_disk = null
+before_update = null
 [training.batcher]
 @batchers = "spacy.batch_by_padded.v1"
 [training.optimizer.learn_rate]
 @schedules = "warmup_linear.v1"
+warmup_steps = 500
+total_steps = 25000
 initial_rate = 0.00005
 [training.score_weights]
+tag_acc = 0.4
+ents_f = 0.6
 ents_p = 0.0
 ents_r = 0.0
 ents_per_type = null
 [initialize.components.ner.labels]
 @readers = "spacy.read_labels.v1"
+path = "ner.json"
 require = false
 [initialize.components.tagger]
 [initialize.components.tagger.labels]
 @readers = "spacy.read_labels.v1"
+path = "tagger.json"
 require = false
 [initialize.tokenizer]

en_docusco_spacy_cd_trf-any-py3-none-any.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:86ef06b7e24e61928327ae520998138f37ccf11b878ef1da7f848d83aabad23f
+size 466599656

meta.json CHANGED Viewed

@@ -1,14 +1,14 @@
 {
   "lang":"en",
-  "name":"docusco_spacy_fc_trf",
-  "version":"1.1",
-  "description":"English pipeline for part-of-speech and rhetorical tagging.",
   "author":"David Brown",
   "email":"dwb2@andrew.cmu.edu",
-  "url":"https://browndw.github.io/docuscope-docs/",
   "license":"MIT",
-  "spacy_version":">=3.4.3,<3.5.0",
-  "spacy_git_version":"Unknown",
   "vectors":{
     "width":0,
     "vectors":0,
@@ -56,6 +56,9 @@
       "DD2",
       "DDQ",
       "DDQGE",
       "DDQV",
       "DDQV31",
       "DDQV32",
@@ -84,6 +87,10 @@
       "JJ31",
       "JJ32",
       "JJ33",
       "JJR",
       "JJT",
       "JK",
@@ -112,10 +119,8 @@
       "NN22",
       "NN221",
       "NN222",
-      "NN231",
-      "NN232",
-      "NN233",
       "NN31",
       "NN33",
       "NNA",
       "NNB",
@@ -124,12 +129,17 @@
       "NNO",
       "NNO2",
       "NNT1",
       "NNT2",
       "NNU",
       "NNU1",
       "NNU2",
       "NNU21",
       "NNU22",
       "NP",
       "NP1",
       "NP2",
@@ -149,6 +159,9 @@
       "PNQS32",
       "PNQS33",
       "PNQV",
       "PNX1",
       "PPGE",
       "PPH1",
@@ -180,6 +193,10 @@
       "RG",
       "RG21",
       "RG22",
       "RGQ",
       "RGQV",
       "RGQV31",
@@ -190,6 +207,9 @@
       "RL",
       "RL21",
       "RL22",
       "RP",
       "RPK",
       "RR",
@@ -307,112 +327,112 @@
   ],
   "performance":{
-    "tag_acc":0.9838746739,
-    "ents_f":0.886203716,
-    "ents_p":0.889028963,
-    "ents_r":0.8833963688,
     "ents_per_type":{
       "ActorsFirstPerson":{
-        "p":0.9048672566,
-        "r":0.9176833544,
-        "f":0.9112302444
       },
-      "ActorsAbstractions":{
-        "p":0.8884982639,
-        "r":0.8868047132,
-        "f":0.8876506808
       },
-      "SentimentPositive":{
-        "p":0.8560008306,
-        "r":0.827811245,
-        "f":0.8416700694
       },
       "ActorsPeople":{
-        "p":0.9271072667,
-        "r":0.9305028034,
-        "f":0.9288019317
       },
       "SignpostingMetadiscourse":{
-        "p":0.9420750336,
-        "r":0.9215780036,
-        "f":0.9317138023
       },
       "OrganizationReasoning":{
-        "p":0.9138317376,
-        "r":0.8952304929,
-        "f":0.9044354839
       },
-      "SentimentNegative":{
-        "p":0.8280952381,
-        "r":0.8157206996,
-        "f":0.8218613915
       },
-      "OrganizationNarrative":{
-        "p":0.8888659154,
-        "r":0.8726276261,
-        "f":0.8806719246
       },
-      "ActorsPublicEntities":{
-        "p":0.913087316,
-        "r":0.8978782471,
-        "f":0.9054189162
       },
-      "ConfidenceHedged":{
-        "p":0.9044895625,
-        "r":0.9052661706,
-        "f":0.9048776999
       },
-      "StanceEmphatic":{
-        "p":0.864783265,
-        "r":0.9101225601,
-        "f":0.8868738251
       },
       "ConfidenceHigh":{
-        "p":0.8696095076,
-        "r":0.8573819886,
-        "f":0.8634524612
-      },
-      "PlanningFuture":{
-        "p":0.8828828829,
-        "r":0.8994576554,
-        "f":0.8910932011
       },
-      "SignpostingAcademicWritingMoves":{
-        "p":0.7609090909,
-        "r":0.7678899083,
-        "f":0.7643835616
-      },
-      "PlanningStrategy":{
-        "p":0.8513819665,
-        "r":0.8205240175,
-        "f":0.8356682233
       },
-      "CitationAuthority":{
-        "p":0.8544839255,
-        "r":0.8265139116,
-        "f":0.840266223
       },
-      "StanceModerated":{
-        "p":0.8590852905,
-        "r":0.8853503185,
-        "f":0.8720200753
       },
-      "CitationNeutral":{
-        "p":0.8832214765,
-        "r":0.8832214765,
-        "f":0.8832214765
       },
-      "CitationControversy":{
-        "p":0.8739837398,
-        "r":0.8884297521,
-        "f":0.881147541
       }
     },
-    "transformer_loss":23198.0035903843,
-    "tagger_loss":6697.7777622218,
-    "ner_loss":20484.2334804777
   },
   "requirements":[
-    "spacy-transformers>=1.1.8,<1.2.0"
   ]
 }

 {
   "lang":"en",
+  "name":"docusco_spacy_cd_trf",
+  "version":"1.3",
+  "description":"English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.",
   "author":"David Brown",
   "email":"dwb2@andrew.cmu.edu",
+  "url":"https://docuscope.github.io",
   "license":"MIT",
+  "spacy_version":">=3.7.4,<3.8.0",
+  "spacy_git_version":"bff8725f4",
   "vectors":{
     "width":0,
     "vectors":0,
       "DD2",
       "DDQ",
       "DDQGE",
+      "DDQGE31",
+      "DDQGE32",
+      "DDQGE33",
       "DDQV",
       "DDQV31",
       "DDQV32",
       "JJ31",
       "JJ32",
       "JJ33",
+      "JJ41",
+      "JJ42",
+      "JJ43",
+      "JJ44",
       "JJR",
       "JJT",
       "JK",
       "NN22",
       "NN221",
       "NN222",
       "NN31",
+      "NN32",
       "NN33",
       "NNA",
       "NNB",
       "NNO",
       "NNO2",
       "NNT1",
+      "NNT131",
+      "NNT132",
+      "NNT133",
       "NNT2",
       "NNU",
       "NNU1",
       "NNU2",
       "NNU21",
       "NNU22",
+      "NNU221",
+      "NNU222",
       "NP",
       "NP1",
       "NP2",
       "PNQS32",
       "PNQS33",
       "PNQV",
+      "PNQV31",
+      "PNQV32",
+      "PNQV33",
       "PNX1",
       "PPGE",
       "PPH1",
       "RG",
       "RG21",
       "RG22",
+      "RG41",
+      "RG42",
+      "RG43",
+      "RG44",
       "RGQ",
       "RGQV",
       "RGQV31",
       "RL",
       "RL21",
       "RL22",
+      "RL31",
+      "RL32",
+      "RL33",
       "RP",
       "RPK",
       "RR",
   ],
   "performance":{
+    "tag_acc":0.9860324848,
+    "ents_f":0.8986060124,
+    "ents_p":0.8975978922,
+    "ents_r":0.8996163997,
     "ents_per_type":{
       "ActorsFirstPerson":{
+        "p":0.9297243488,
+        "r":0.9421626555,
+        "f":0.9359021772
       },
+      "OrganizationNarrative":{
+        "p":0.8982249764,
+        "r":0.9052289888,
+        "f":0.901713382
       },
+      "ConfidenceHedged":{
+        "p":0.9133998382,
+        "r":0.925173412,
+        "f":0.9192489282
+      },
+      "StanceEmphatic":{
+        "p":0.9163952226,
+        "r":0.9306501792,
+        "f":0.9234676931
       },
       "ActorsPeople":{
+        "p":0.9048275066,
+        "r":0.9085233815,
+        "f":0.9066716777
       },
       "SignpostingMetadiscourse":{
+        "p":0.9521945378,
+        "r":0.9343999277,
+        "f":0.9432133122
+      },
+      "PlanningStrategy":{
+        "p":0.867487328,
+        "r":0.8729657518,
+        "f":0.8702179177
       },
       "OrganizationReasoning":{
+        "p":0.9162113643,
+        "r":0.913893106,
+        "f":0.9150507669
       },
+      "ActorsAbstractions":{
+        "p":0.8978776116,
+        "r":0.9052445851,
+        "f":0.9015460488
       },
+      "SentimentPositive":{
+        "p":0.8603518268,
+        "r":0.8566270255,
+        "f":0.8584853859
       },
+      "SentimentNegative":{
+        "p":0.8577821301,
+        "r":0.8418267418,
+        "f":0.8497295439
       },
+      "CitationAuthority":{
+        "p":0.8555627846,
+        "r":0.8453873353,
+        "f":0.8504446241
       },
+      "StanceModerated":{
+        "p":0.8848971874,
+        "r":0.9246727587,
+        "f":0.9043478261
       },
       "ConfidenceHigh":{
+        "p":0.8963930348,
+        "r":0.9093432591,
+        "f":0.9028217093
       },
+      "CitationControversy":{
+        "p":0.8772563177,
+        "r":0.9109653233,
+        "f":0.8937931034
       },
+      "CitationNeutral":{
+        "p":0.9121713201,
+        "r":0.9254675468,
+        "f":0.9187713311
       },
+      "PlanningFuture":{
+        "p":0.891873065,
+        "r":0.915613826,
+        "f":0.9035875319
       },
+      "ActorsPublicEntities":{
+        "p":0.9129542262,
+        "r":0.9113132257,
+        "f":0.9121329879
       },
+      "SignpostingAcademicWritingMoves":{
+        "p":0.7986216171,
+        "r":0.8133881185,
+        "f":0.8059372349
       }
     },
+    "transformer_loss":46711.3121389837,
+    "tagger_loss":14058.3003581261,
+    "ner_loss":41682.5447270232
   },
   "requirements":[
+    "spacy-transformers>=1.3.5,<1.4.0"
   ]
 }

ner/model CHANGED Viewed

Binary files a/ner/model and b/ner/model differ

ner/moves CHANGED Viewed

@@ -1 +1 @@

- ��moves�P{"0":{},"1":{"~~ActorsAbstractions~~":~~574627~~,"~~SentimentNegative~~":~~505726~~,"~~ActorsPeople~~":~~489704~~,"~~SentimentPositive~~":~~329499~~,"~~OrganizationNarrative~~":~~327796~~,"SignpostingMetadiscourse":~~285541~~,"ActorsFirstPerson":~~242622~~,"OrganizationReasoning":~~182971~~,"StanceEmphatic":~~148905~~,"ActorsPublicEntities":~~141386~~,"ConfidenceHedged":~~130515~~,"ConfidenceHigh":~~119696~~,"PlanningFuture":~~91199~~,"PlanningStrategy":~~77436~~,"SignpostingAcademicWritingMoves":~~45355~~,"CitationNeutral":~~28827~~,"StanceModerated":~~24981~~,"CitationAuthority":~~24697~~,"CitationControversy":~~7780~~},"2":{"~~ActorsAbstractions~~":~~574627~~,"~~SentimentNegative~~":~~505726~~,"~~ActorsPeople~~":~~489704~~,"~~SentimentPositive~~":~~329499~~,"~~OrganizationNarrative~~":~~327796~~,"SignpostingMetadiscourse":~~285541~~,"ActorsFirstPerson":~~242622~~,"OrganizationReasoning":~~182971~~,"StanceEmphatic":~~148905~~,"ActorsPublicEntities":~~141386~~,"ConfidenceHedged":~~130515~~,"ConfidenceHigh":~~119696~~,"PlanningFuture":~~91199~~,"PlanningStrategy":~~77436~~,"SignpostingAcademicWritingMoves":~~45355~~,"CitationNeutral":~~28827~~,"StanceModerated":~~24981~~,"CitationAuthority":~~24697~~,"CitationControversy":~~7780~~},"3":{"~~ActorsAbstractions~~":~~574627~~,"~~SentimentNegative~~":~~505726~~,"~~ActorsPeople~~":~~489704~~,"~~SentimentPositive~~":~~329499~~,"~~OrganizationNarrative~~":~~327796~~,"SignpostingMetadiscourse":~~285541~~,"ActorsFirstPerson":~~242622~~,"OrganizationReasoning":~~182971~~,"StanceEmphatic":~~148905~~,"ActorsPublicEntities":~~141386~~,"ConfidenceHedged":~~130515~~,"ConfidenceHigh":~~119696~~,"PlanningFuture":~~91199~~,"PlanningStrategy":~~77436~~,"SignpostingAcademicWritingMoves":~~45355~~,"CitationNeutral":~~28827~~,"StanceModerated":~~24981~~,"CitationAuthority":~~24697~~,"CitationControversy":~~7780~~},"4":{"~~ActorsAbstractions~~":~~574627~~,"~~SentimentNegative~~":~~505726~~,"~~ActorsPeople~~":~~489704~~,"~~SentimentPositive~~":~~329499~~,"~~OrganizationNarrative~~":~~327796~~,"SignpostingMetadiscourse":~~285541~~,"ActorsFirstPerson":~~242622~~,"OrganizationReasoning":~~182971~~,"StanceEmphatic":~~148905~~,"ActorsPublicEntities":~~141386~~,"ConfidenceHedged":~~130515~~,"ConfidenceHigh":~~119696~~,"PlanningFuture":~~91199~~,"PlanningStrategy":~~77436~~,"SignpostingAcademicWritingMoves":~~45355~~,"CitationNeutral":~~28827~~,"StanceModerated":~~24981~~,"CitationAuthority":~~24697~~,"CitationControversy":~~7780~~,"":1},"5":{"":1}}�cfg��neg_key�

+ ��moves�l{"0":{},"1":{"ActorsPeople":1591194,"ActorsAbstractions":1564271,"SentimentNegative":1302786,"OrganizationNarrative":871730,"SentimentPositive":863940,"SignpostingMetadiscourse":697430,"ActorsFirstPerson":650913,"OrganizationReasoning":427949,"StanceEmphatic":377909,"ActorsPublicEntities":354014,"ConfidenceHedged":320162,"ConfidenceHigh":296184,"PlanningFuture":224817,"PlanningStrategy":197087,"SignpostingAcademicWritingMoves":113408,"CitationNeutral":68527,"StanceModerated":60423,"CitationAuthority":56832,"CitationControversy":16582},"2":{"ActorsPeople":1591194,"ActorsAbstractions":1564271,"SentimentNegative":1302786,"OrganizationNarrative":871730,"SentimentPositive":863940,"SignpostingMetadiscourse":697430,"ActorsFirstPerson":650913,"OrganizationReasoning":427949,"StanceEmphatic":377909,"ActorsPublicEntities":354014,"ConfidenceHedged":320162,"ConfidenceHigh":296184,"PlanningFuture":224817,"PlanningStrategy":197087,"SignpostingAcademicWritingMoves":113408,"CitationNeutral":68527,"StanceModerated":60423,"CitationAuthority":56832,"CitationControversy":16582},"3":{"ActorsPeople":1591194,"ActorsAbstractions":1564271,"SentimentNegative":1302786,"OrganizationNarrative":871730,"SentimentPositive":863940,"SignpostingMetadiscourse":697430,"ActorsFirstPerson":650913,"OrganizationReasoning":427949,"StanceEmphatic":377909,"ActorsPublicEntities":354014,"ConfidenceHedged":320162,"ConfidenceHigh":296184,"PlanningFuture":224817,"PlanningStrategy":197087,"SignpostingAcademicWritingMoves":113408,"CitationNeutral":68527,"StanceModerated":60423,"CitationAuthority":56832,"CitationControversy":16582},"4":{"ActorsPeople":1591194,"ActorsAbstractions":1564271,"SentimentNegative":1302786,"OrganizationNarrative":871730,"SentimentPositive":863940,"SignpostingMetadiscourse":697430,"ActorsFirstPerson":650913,"OrganizationReasoning":427949,"StanceEmphatic":377909,"ActorsPublicEntities":354014,"ConfidenceHedged":320162,"ConfidenceHigh":296184,"PlanningFuture":224817,"PlanningStrategy":197087,"SignpostingAcademicWritingMoves":113408,"CitationNeutral":68527,"StanceModerated":60423,"CitationAuthority":56832,"CitationControversy":16582,"":1},"5":{"":1}}�cfg��neg_key�

tagger/cfg CHANGED Viewed

@@ -1,4 +1,5 @@
 {
   "labels":[
     "APPGE",
     "AT",
@@ -36,6 +37,9 @@
     "DD2",
     "DDQ",
     "DDQGE",
     "DDQV",
     "DDQV31",
     "DDQV32",
@@ -64,6 +68,10 @@
     "JJ31",
     "JJ32",
     "JJ33",
     "JJR",
     "JJT",
     "JK",
@@ -92,10 +100,8 @@
     "NN22",
     "NN221",
     "NN222",
-    "NN231",
-    "NN232",
-    "NN233",
     "NN31",
     "NN33",
     "NNA",
     "NNB",
@@ -104,12 +110,17 @@
     "NNO",
     "NNO2",
     "NNT1",
     "NNT2",
     "NNU",
     "NNU1",
     "NNU2",
     "NNU21",
     "NNU22",
     "NP",
     "NP1",
     "NP2",
@@ -129,6 +140,9 @@
     "PNQS32",
     "PNQS33",
     "PNQV",
     "PNX1",
     "PPGE",
     "PPH1",
@@ -160,6 +174,10 @@
     "RG",
     "RG21",
     "RG22",
     "RGQ",
     "RGQV",
     "RGQV31",
@@ -170,6 +188,9 @@
     "RL",
     "RL21",
     "RL22",
     "RP",
     "RPK",
     "RR",

 {
+  "label_smoothing":0.0,
   "labels":[
     "APPGE",
     "AT",
     "DD2",
     "DDQ",
     "DDQGE",
+    "DDQGE31",
+    "DDQGE32",
+    "DDQGE33",
     "DDQV",
     "DDQV31",
     "DDQV32",
     "JJ31",
     "JJ32",
     "JJ33",
+    "JJ41",
+    "JJ42",
+    "JJ43",
+    "JJ44",
     "JJR",
     "JJT",
     "JK",
     "NN22",
     "NN221",
     "NN222",
     "NN31",
+    "NN32",
     "NN33",
     "NNA",
     "NNB",
     "NNO",
     "NNO2",
     "NNT1",
+    "NNT131",
+    "NNT132",
+    "NNT133",
     "NNT2",
     "NNU",
     "NNU1",
     "NNU2",
     "NNU21",
     "NNU22",
+    "NNU221",
+    "NNU222",
     "NP",
     "NP1",
     "NP2",
     "PNQS32",
     "PNQS33",
     "PNQV",
+    "PNQV31",
+    "PNQV32",
+    "PNQV33",
     "PNX1",
     "PPGE",
     "PPH1",
     "RG",
     "RG21",
     "RG22",
+    "RG41",
+    "RG42",
+    "RG43",
+    "RG44",
     "RGQ",
     "RGQV",
     "RGQV31",
     "RL",
     "RL21",
     "RL22",
+    "RL31",
+    "RL32",
+    "RL33",
     "RP",
     "RPK",
     "RR",

tagger/model CHANGED Viewed

Binary files a/tagger/model and b/tagger/model differ

tokenizer CHANGED Viewed

The diff for this file is too large to render. See raw diff

transformer/model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:22576948b84086ef0634f91f089cb600a12dcf97e5c37e27caf9ddf1d2cebfb8
-size 502030632

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe6a244db75cb192f39add536b26df730b2a4d4eb7ea298436e2f907f54a6d91
+size 502027402

vocab/strings.json CHANGED Viewed

The diff for this file is too large to render. See raw diff