Ella01 commited on
Commit
3699257
β€’
1 Parent(s): 7fb9669

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer/pkuseg_processors filter=lfs diff=lfs merge=lfs -text
37
+ transformer/model filter=lfs diff=lfs merge=lfs -text
38
+ zh_core_web_trf-any-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - zh
7
+ license: mit
8
+ model-index:
9
+ - name: zh_core_web_trf
10
+ results:
11
+ - task:
12
+ name: NER
13
+ type: token-classification
14
+ metrics:
15
+ - name: NER Precision
16
+ type: precision
17
+ value: 0.7608897127
18
+ - name: NER Recall
19
+ type: recall
20
+ value: 0.7217582418
21
+ - name: NER F Score
22
+ type: f_score
23
+ value: 0.7408075795
24
+ - task:
25
+ name: TAG
26
+ type: token-classification
27
+ metrics:
28
+ - name: TAG (XPOS) Accuracy
29
+ type: accuracy
30
+ value: 0.9175332527
31
+ - task:
32
+ name: UNLABELED_DEPENDENCIES
33
+ type: token-classification
34
+ metrics:
35
+ - name: Unlabeled Attachment Score (UAS)
36
+ type: f_score
37
+ value: 0.7572203056
38
+ - task:
39
+ name: LABELED_DEPENDENCIES
40
+ type: token-classification
41
+ metrics:
42
+ - name: Labeled Attachment Score (LAS)
43
+ type: f_score
44
+ value: 0.7145288854
45
+ - task:
46
+ name: SENTS
47
+ type: token-classification
48
+ metrics:
49
+ - name: Sentences F-Score
50
+ type: f_score
51
+ value: 0.6920716113
52
+ ---
53
+ Chinese transformer pipeline (Transformer(name='bert-base-chinese', piece_encoder='bert-wordpiece', stride=152, type='bert', width=768, window=208, vocab_size=21128)). Components: transformer, tagger, parser, ner, attribute_ruler.
54
+
55
+ | Feature | Description |
56
+ | --- | --- |
57
+ | **Name** | `zh_core_web_trf` |
58
+ | **Version** | `3.7.2` |
59
+ | **spaCy** | `>=3.7.0,<3.8.0` |
60
+ | **Default Pipeline** | `transformer`, `tagger`, `parser`, `attribute_ruler`, `ner` |
61
+ | **Components** | `transformer`, `tagger`, `parser`, `attribute_ruler`, `ner` |
62
+ | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
63
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br>[CoreNLP Universal Dependencies Converter](https://nlp.stanford.edu/software/stanford-dependencies.html) (Stanford NLP Group)<br>[bert-base-chinese](https://huggingface.co/bert-base-chinese) (Hugging Face) |
64
+ | **License** | `MIT` |
65
+ | **Author** | [Explosion](https://explosion.ai) |
66
+
67
+ ### Label Scheme
68
+
69
+ <details>
70
+
71
+ <summary>View label scheme (99 labels for 3 components)</summary>
72
+
73
+ | Component | Labels |
74
+ | --- | --- |
75
+ | **`tagger`** | `AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X` |
76
+ | **`parser`** | `ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp` |
77
+ | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
78
+
79
+ </details>
80
+
81
+ ### Accuracy
82
+
83
+ | Type | Score |
84
+ | --- | --- |
85
+ | `TOKEN_ACC` | 95.85 |
86
+ | `TOKEN_P` | 94.58 |
87
+ | `TOKEN_R` | 91.36 |
88
+ | `TOKEN_F` | 92.94 |
89
+ | `TAG_ACC` | 91.75 |
90
+ | `SENTS_P` | 70.92 |
91
+ | `SENTS_R` | 67.57 |
92
+ | `SENTS_F` | 69.21 |
93
+ | `DEP_UAS` | 75.72 |
94
+ | `DEP_LAS` | 71.45 |
95
+ | `ENTS_P` | 76.09 |
96
+ | `ENTS_R` | 72.18 |
97
+ | `ENTS_F` | 74.08 |
attribute_ruler/patterns ADDED
Binary file (2.12 kB). View file
 
config.cfg ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [paths]
2
+ train = null
3
+ dev = null
4
+ vectors = null
5
+ init_tok2vec = null
6
+
7
+ [system]
8
+ gpu_allocator = "pytorch"
9
+ seed = 1
10
+
11
+ [nlp]
12
+ lang = "zh"
13
+ pipeline = ["transformer","tagger","parser","attribute_ruler","ner"]
14
+ disabled = []
15
+ before_creation = null
16
+ after_creation = null
17
+ after_pipeline_creation = null
18
+ batch_size = 64
19
+ vectors = {"@vectors":"spacy.Vectors.v1"}
20
+
21
+ [nlp.tokenizer]
22
+ @tokenizers = "spacy.zh.ChineseTokenizer"
23
+ segmenter = "pkuseg"
24
+
25
+ [components]
26
+
27
+ [components.attribute_ruler]
28
+ factory = "attribute_ruler"
29
+ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
30
+ validate = false
31
+
32
+ [components.ner]
33
+ factory = "ner"
34
+ incorrect_spans_key = null
35
+ moves = null
36
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
37
+ update_with_oracle_cut_size = 100
38
+
39
+ [components.ner.model]
40
+ @architectures = "spacy.TransitionBasedParser.v2"
41
+ state_type = "ner"
42
+ extra_state_tokens = false
43
+ hidden_width = 64
44
+ maxout_pieces = 2
45
+ use_upper = false
46
+ nO = null
47
+
48
+ [components.ner.model.tok2vec]
49
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
50
+ width = ${components.transformer.model.hidden_width}
51
+ upstream = "transformer"
52
+ pooling = {"@layers":"reduce_mean.v1"}
53
+ grad_factor = 1.0
54
+
55
+ [components.parser]
56
+ factory = "parser"
57
+ learn_tokens = false
58
+ min_action_freq = 30
59
+ moves = null
60
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
61
+ update_with_oracle_cut_size = 100
62
+
63
+ [components.parser.model]
64
+ @architectures = "spacy.TransitionBasedParser.v2"
65
+ state_type = "parser"
66
+ extra_state_tokens = false
67
+ hidden_width = 64
68
+ maxout_pieces = 2
69
+ use_upper = false
70
+ nO = null
71
+
72
+ [components.parser.model.tok2vec]
73
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
74
+ width = ${components.transformer.model.hidden_width}
75
+ upstream = "transformer"
76
+ pooling = {"@layers":"reduce_mean.v1"}
77
+ grad_factor = 1.0
78
+
79
+ [components.tagger]
80
+ factory = "tagger"
81
+ label_smoothing = 0.0
82
+ neg_prefix = "!"
83
+ overwrite = false
84
+ scorer = {"@scorers":"spacy.tagger_scorer.v1"}
85
+
86
+ [components.tagger.model]
87
+ @architectures = "spacy.Tagger.v2"
88
+ nO = null
89
+ normalize = false
90
+
91
+ [components.tagger.model.tok2vec]
92
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
93
+ width = ${components.transformer.model.hidden_width}
94
+ upstream = "transformer"
95
+ pooling = {"@layers":"reduce_mean.v1"}
96
+ grad_factor = 1.0
97
+
98
+ [components.transformer]
99
+ factory = "curated_transformer"
100
+ all_layer_outputs = false
101
+ frozen = false
102
+
103
+ [components.transformer.model]
104
+ @architectures = "spacy-curated-transformers.BertTransformer.v1"
105
+ vocab_size = 21128
106
+ hidden_width = 768
107
+ piece_encoder = {"@architectures":"spacy-curated-transformers.BertWordpieceEncoder.v1"}
108
+ attention_probs_dropout_prob = 0.1
109
+ hidden_act = "gelu"
110
+ hidden_dropout_prob = 0.1
111
+ intermediate_width = 3072
112
+ layer_norm_eps = 0.0
113
+ max_position_embeddings = 512
114
+ model_max_length = 512
115
+ num_attention_heads = 12
116
+ num_hidden_layers = 12
117
+ padding_idx = 0
118
+ type_vocab_size = 2
119
+ torchscript = false
120
+ mixed_precision = false
121
+ wrapped_listener = null
122
+
123
+ [components.transformer.model.grad_scaler_config]
124
+
125
+ [components.transformer.model.with_spans]
126
+ @architectures = "spacy-curated-transformers.WithStridedSpans.v1"
127
+ stride = 152
128
+ window = 208
129
+ batch_size = 384
130
+
131
+ [corpora]
132
+
133
+ [corpora.dev]
134
+ @readers = "spacy.Corpus.v1"
135
+ path = ${paths.dev}
136
+ gold_preproc = false
137
+ max_length = 0
138
+ limit = 0
139
+ augmenter = null
140
+
141
+ [corpora.train]
142
+ @readers = "spacy.Corpus.v1"
143
+ path = ${paths.train}
144
+ gold_preproc = false
145
+ max_length = 0
146
+ limit = 0
147
+ augmenter = null
148
+
149
+ [training]
150
+ train_corpus = "corpora.train"
151
+ dev_corpus = "corpora.dev"
152
+ seed = ${system:seed}
153
+ gpu_allocator = ${system:gpu_allocator}
154
+ dropout = 0.1
155
+ accumulate_gradient = 3
156
+ patience = 5000
157
+ max_epochs = 0
158
+ max_steps = 20000
159
+ eval_frequency = 1000
160
+ frozen_components = []
161
+ before_to_disk = null
162
+ annotating_components = []
163
+ before_update = null
164
+
165
+ [training.batcher]
166
+ @batchers = "spacy.batch_by_words.v1"
167
+ discard_oversize = false
168
+ size = 2000
169
+ tolerance = 0.2
170
+ get_length = null
171
+
172
+ [training.logger]
173
+ @loggers = "spacy.ConsoleLogger.v1"
174
+ progress_bar = false
175
+
176
+ [training.optimizer]
177
+ @optimizers = "Adam.v1"
178
+ beta1 = 0.9
179
+ beta2 = 0.999
180
+ L2_is_weight_decay = true
181
+ L2 = 0.01
182
+ grad_clip = 1.0
183
+ use_averages = true
184
+ eps = 0.00000001
185
+
186
+ [training.optimizer.learn_rate]
187
+ @schedules = "warmup_linear.v1"
188
+ warmup_steps = 250
189
+ total_steps = 20000
190
+ initial_rate = 0.00005
191
+
192
+ [training.score_weights]
193
+ tag_acc = 0.32
194
+ dep_uas = 0.0
195
+ dep_las = 0.32
196
+ dep_las_per_type = null
197
+ sents_p = null
198
+ sents_r = null
199
+ sents_f = 0.04
200
+ ents_f = 0.32
201
+ ents_p = 0.0
202
+ ents_r = 0.0
203
+ ents_per_type = null
204
+ speed = 0.0
205
+
206
+ [pretraining]
207
+
208
+ [initialize]
209
+ vocab_data = null
210
+ vectors = ${paths.vectors}
211
+ init_tok2vec = ${paths.init_tok2vec}
212
+ before_init = null
213
+ after_init = null
214
+
215
+ [initialize.components]
216
+
217
+ [initialize.components.ner]
218
+
219
+ [initialize.components.ner.labels]
220
+ @readers = "spacy.read_labels.v1"
221
+ path = "corpus/labels/ner.json"
222
+ require = false
223
+
224
+ [initialize.components.parser]
225
+
226
+ [initialize.components.parser.labels]
227
+ @readers = "spacy.read_labels.v1"
228
+ path = "corpus/labels/parser.json"
229
+ require = false
230
+
231
+ [initialize.components.tagger]
232
+
233
+ [initialize.components.tagger.labels]
234
+ @readers = "spacy.read_labels.v1"
235
+ path = "corpus/labels/tagger.json"
236
+ require = false
237
+
238
+ [initialize.components.transformer]
239
+
240
+ [initialize.components.transformer.encoder_loader]
241
+ @model_loaders = "spacy-curated-transformers.HFTransformerEncoderLoader.v1"
242
+ name = "bert-base-chinese"
243
+ revision = "main"
244
+
245
+ [initialize.components.transformer.piecer_loader]
246
+ @model_loaders = "spacy-curated-transformers.HFPieceEncoderLoader.v1"
247
+ name = "bert-base-chinese"
248
+ revision = "main"
249
+
250
+ [initialize.lookups]
251
+ @misc = "spacy.LookupsDataLoader.v1"
252
+ lang = ${nlp.lang}
253
+ tables = []
254
+
255
+ [initialize.tokenizer]
256
+ pkuseg_model = "assets/pkuseg_model"
257
+ pkuseg_user_dict = "default"
meta.json ADDED
@@ -0,0 +1,507 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lang":"zh",
3
+ "name":"core_web_trf",
4
+ "version":"3.7.2",
5
+ "description":"Chinese transformer pipeline (Transformer(name='bert-base-chinese', piece_encoder='bert-wordpiece', stride=152, type='bert', width=768, window=208, vocab_size=21128)). Components: transformer, tagger, parser, ner, attribute_ruler.",
6
+ "author":"Explosion",
7
+ "email":"contact@explosion.ai",
8
+ "url":"https://explosion.ai",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.7.0,<3.8.0",
11
+ "spacy_git_version":"4ec41e98f",
12
+ "vectors":{
13
+ "width":0,
14
+ "vectors":0,
15
+ "keys":0,
16
+ "name":null
17
+ },
18
+ "labels":{
19
+ "transformer":[
20
+
21
+ ],
22
+ "tagger":[
23
+ "AD",
24
+ "AS",
25
+ "BA",
26
+ "CC",
27
+ "CD",
28
+ "CS",
29
+ "DEC",
30
+ "DEG",
31
+ "DER",
32
+ "DEV",
33
+ "DT",
34
+ "ETC",
35
+ "FW",
36
+ "IJ",
37
+ "INF",
38
+ "JJ",
39
+ "LB",
40
+ "LC",
41
+ "M",
42
+ "MSP",
43
+ "NN",
44
+ "NR",
45
+ "NT",
46
+ "OD",
47
+ "ON",
48
+ "P",
49
+ "PN",
50
+ "PU",
51
+ "SB",
52
+ "SP",
53
+ "URL",
54
+ "VA",
55
+ "VC",
56
+ "VE",
57
+ "VV",
58
+ "X"
59
+ ],
60
+ "parser":[
61
+ "ROOT",
62
+ "acl",
63
+ "advcl:loc",
64
+ "advmod",
65
+ "advmod:dvp",
66
+ "advmod:loc",
67
+ "advmod:rcomp",
68
+ "amod",
69
+ "amod:ordmod",
70
+ "appos",
71
+ "aux:asp",
72
+ "aux:ba",
73
+ "aux:modal",
74
+ "aux:prtmod",
75
+ "auxpass",
76
+ "case",
77
+ "cc",
78
+ "ccomp",
79
+ "compound:nn",
80
+ "compound:vc",
81
+ "conj",
82
+ "cop",
83
+ "dep",
84
+ "det",
85
+ "discourse",
86
+ "dobj",
87
+ "etc",
88
+ "mark",
89
+ "mark:clf",
90
+ "name",
91
+ "neg",
92
+ "nmod",
93
+ "nmod:assmod",
94
+ "nmod:poss",
95
+ "nmod:prep",
96
+ "nmod:range",
97
+ "nmod:tmod",
98
+ "nmod:topic",
99
+ "nsubj",
100
+ "nsubj:xsubj",
101
+ "nsubjpass",
102
+ "nummod",
103
+ "parataxis:prnmod",
104
+ "punct",
105
+ "xcomp"
106
+ ],
107
+ "attribute_ruler":[
108
+
109
+ ],
110
+ "ner":[
111
+ "CARDINAL",
112
+ "DATE",
113
+ "EVENT",
114
+ "FAC",
115
+ "GPE",
116
+ "LANGUAGE",
117
+ "LAW",
118
+ "LOC",
119
+ "MONEY",
120
+ "NORP",
121
+ "ORDINAL",
122
+ "ORG",
123
+ "PERCENT",
124
+ "PERSON",
125
+ "PRODUCT",
126
+ "QUANTITY",
127
+ "TIME",
128
+ "WORK_OF_ART"
129
+ ]
130
+ },
131
+ "pipeline":[
132
+ "transformer",
133
+ "tagger",
134
+ "parser",
135
+ "attribute_ruler",
136
+ "ner"
137
+ ],
138
+ "components":[
139
+ "transformer",
140
+ "tagger",
141
+ "parser",
142
+ "attribute_ruler",
143
+ "ner"
144
+ ],
145
+ "disabled":[
146
+
147
+ ],
148
+ "performance":{
149
+ "token_acc":0.9585384056,
150
+ "token_p":0.9458325855,
151
+ "token_r":0.9136060443,
152
+ "token_f":0.9294400505,
153
+ "tag_acc":0.9175332527,
154
+ "sents_p":0.7092434038,
155
+ "sents_r":0.6757116697,
156
+ "sents_f":0.6920716113,
157
+ "dep_uas":0.7572203056,
158
+ "dep_las":0.7145288854,
159
+ "dep_las_per_type":{
160
+ "dep":{
161
+ "p":0.5542676502,
162
+ "r":0.4251793473,
163
+ "f":0.4812167648
164
+ },
165
+ "case":{
166
+ "p":0.9020435069,
167
+ "r":0.8295344326,
168
+ "f":0.8642708268
169
+ },
170
+ "nmod:tmod":{
171
+ "p":0.7832446809,
172
+ "r":0.8013605442,
173
+ "f":0.7921990585
174
+ },
175
+ "nummod":{
176
+ "p":0.8815789474,
177
+ "r":0.5802798135,
178
+ "f":0.6998794697
179
+ },
180
+ "mark:clf":{
181
+ "p":0.9339393939,
182
+ "r":0.5747855278,
183
+ "f":0.711613946
184
+ },
185
+ "auxpass":{
186
+ "p":0.9095744681,
187
+ "r":0.9243243243,
188
+ "f":0.9168900804
189
+ },
190
+ "nsubj":{
191
+ "p":0.8642424242,
192
+ "r":0.7882324039,
193
+ "f":0.8244892715
194
+ },
195
+ "acl":{
196
+ "p":0.7845096814,
197
+ "r":0.6966167499,
198
+ "f":0.7379553467
199
+ },
200
+ "advmod":{
201
+ "p":0.868605557,
202
+ "r":0.7583314441,
203
+ "f":0.8097312999
204
+ },
205
+ "mark":{
206
+ "p":0.8348993289,
207
+ "r":0.8177037686,
208
+ "f":0.8262120877
209
+ },
210
+ "xcomp":{
211
+ "p":0.8014981273,
212
+ "r":0.6970684039,
213
+ "f":0.7456445993
214
+ },
215
+ "nmod:assmod":{
216
+ "p":0.8492146597,
217
+ "r":0.7572362278,
218
+ "f":0.8005923001
219
+ },
220
+ "det":{
221
+ "p":0.8788617886,
222
+ "r":0.633274751,
223
+ "f":0.7361252979
224
+ },
225
+ "amod":{
226
+ "p":0.8216442174,
227
+ "r":0.6948153967,
228
+ "f":0.7529261545
229
+ },
230
+ "nmod:prep":{
231
+ "p":0.8173109819,
232
+ "r":0.7226255293,
233
+ "f":0.7670573126
234
+ },
235
+ "root":{
236
+ "p":0.7621591746,
237
+ "r":0.6886965207,
238
+ "f":0.723567993
239
+ },
240
+ "aux:prtmod":{
241
+ "p":0.9551020408,
242
+ "r":0.8357142857,
243
+ "f":0.8914285714
244
+ },
245
+ "compound:nn":{
246
+ "p":0.7833185448,
247
+ "r":0.7468697124,
248
+ "f":0.764660026
249
+ },
250
+ "dobj":{
251
+ "p":0.8932703275,
252
+ "r":0.8120278477,
253
+ "f":0.8507138423
254
+ },
255
+ "ccomp":{
256
+ "p":0.7626977519,
257
+ "r":0.7122861586,
258
+ "f":0.7366304785
259
+ },
260
+ "advmod:rcomp":{
261
+ "p":0.8369230769,
262
+ "r":0.7534626039,
263
+ "f":0.7930029155
264
+ },
265
+ "nmod:topic":{
266
+ "p":0.4624505929,
267
+ "r":0.3798701299,
268
+ "f":0.4171122995
269
+ },
270
+ "cop":{
271
+ "p":0.8350515464,
272
+ "r":0.6254826255,
273
+ "f":0.7152317881
274
+ },
275
+ "discourse":{
276
+ "p":0.5836267606,
277
+ "r":0.547029703,
278
+ "f":0.5647359455
279
+ },
280
+ "neg":{
281
+ "p":0.8730650155,
282
+ "r":0.6706302021,
283
+ "f":0.7585743107
284
+ },
285
+ "aux:modal":{
286
+ "p":0.8915401302,
287
+ "r":0.8500517063,
288
+ "f":0.870301747
289
+ },
290
+ "nmod":{
291
+ "p":0.7740524781,
292
+ "r":0.7204884668,
293
+ "f":0.7463106114
294
+ },
295
+ "aux:ba":{
296
+ "p":0.9106145251,
297
+ "r":0.8670212766,
298
+ "f":0.8882833787
299
+ },
300
+ "advmod:loc":{
301
+ "p":0.7519379845,
302
+ "r":0.5756676558,
303
+ "f":0.6521008403
304
+ },
305
+ "aux:asp":{
306
+ "p":0.9163179916,
307
+ "r":0.8732057416,
308
+ "f":0.894242548
309
+ },
310
+ "conj":{
311
+ "p":0.6111647672,
312
+ "r":0.5981096408,
313
+ "f":0.6045667335
314
+ },
315
+ "nsubjpass":{
316
+ "p":0.9,
317
+ "r":0.72,
318
+ "f":0.8
319
+ },
320
+ "compound:vc":{
321
+ "p":0.4628820961,
322
+ "r":0.5492227979,
323
+ "f":0.5023696682
324
+ },
325
+ "advcl:loc":{
326
+ "p":0.6488549618,
327
+ "r":0.6071428571,
328
+ "f":0.6273062731
329
+ },
330
+ "cc":{
331
+ "p":0.7943396226,
332
+ "r":0.7471162378,
333
+ "f":0.7700045725
334
+ },
335
+ "advmod:dvp":{
336
+ "p":0.9212598425,
337
+ "r":0.7267080745,
338
+ "f":0.8125
339
+ },
340
+ "appos":{
341
+ "p":0.9382716049,
342
+ "r":0.8735632184,
343
+ "f":0.9047619048
344
+ },
345
+ "nmod:poss":{
346
+ "p":0.7280701754,
347
+ "r":0.6148148148,
348
+ "f":0.6666666667
349
+ },
350
+ "name":{
351
+ "p":0.6261682243,
352
+ "r":0.4962962963,
353
+ "f":0.5537190083
354
+ },
355
+ "nsubj:xsubj":{
356
+ "p":0.0,
357
+ "r":0.0,
358
+ "f":0.0
359
+ },
360
+ "nmod:range":{
361
+ "p":0.8098859316,
362
+ "r":0.7147651007,
363
+ "f":0.7593582888
364
+ },
365
+ "parataxis:prnmod":{
366
+ "p":0.3442622951,
367
+ "r":0.1578947368,
368
+ "f":0.2164948454
369
+ },
370
+ "amod:ordmod":{
371
+ "p":0.7547169811,
372
+ "r":0.625,
373
+ "f":0.6837606838
374
+ },
375
+ "erased":{
376
+ "p":0.0,
377
+ "r":0.0,
378
+ "f":0.0
379
+ },
380
+ "etc":{
381
+ "p":0.9277108434,
382
+ "r":0.9166666667,
383
+ "f":0.9221556886
384
+ }
385
+ },
386
+ "ents_p":0.7608897127,
387
+ "ents_r":0.7217582418,
388
+ "ents_f":0.7408075795,
389
+ "ents_per_type":{
390
+ "DATE":{
391
+ "p":0.7811607992,
392
+ "r":0.8136769078,
393
+ "f":0.7970873786
394
+ },
395
+ "GPE":{
396
+ "p":0.8325837081,
397
+ "r":0.8142717498,
398
+ "f":0.8233259204
399
+ },
400
+ "ORDINAL":{
401
+ "p":0.8488372093,
402
+ "r":0.7684210526,
403
+ "f":0.8066298343
404
+ },
405
+ "FAC":{
406
+ "p":0.3906976744,
407
+ "r":0.4516129032,
408
+ "f":0.4189526185
409
+ },
410
+ "LOC":{
411
+ "p":0.5012406948,
412
+ "r":0.5430107527,
413
+ "f":0.5212903226
414
+ },
415
+ "QUANTITY":{
416
+ "p":0.696,
417
+ "r":0.6444444444,
418
+ "f":0.6692307692
419
+ },
420
+ "ORG":{
421
+ "p":0.7461476075,
422
+ "r":0.700152207,
423
+ "f":0.7224185316
424
+ },
425
+ "PERSON":{
426
+ "p":0.8739386022,
427
+ "r":0.8621134021,
428
+ "f":0.8679857282
429
+ },
430
+ "CARDINAL":{
431
+ "p":0.6729088639,
432
+ "r":0.5433467742,
433
+ "f":0.6012269939
434
+ },
435
+ "NORP":{
436
+ "p":0.6961038961,
437
+ "r":0.5630252101,
438
+ "f":0.6225319396
439
+ },
440
+ "WORK_OF_ART":{
441
+ "p":0.5625,
442
+ "r":0.3,
443
+ "f":0.3913043478
444
+ },
445
+ "TIME":{
446
+ "p":0.7875647668,
447
+ "r":0.7378640777,
448
+ "f":0.7619047619
449
+ },
450
+ "MONEY":{
451
+ "p":0.9256198347,
452
+ "r":0.8296296296,
453
+ "f":0.875
454
+ },
455
+ "EVENT":{
456
+ "p":0.5430463576,
457
+ "r":0.6029411765,
458
+ "f":0.5714285714
459
+ },
460
+ "PERCENT":{
461
+ "p":0.869047619,
462
+ "r":0.8795180723,
463
+ "f":0.874251497
464
+ },
465
+ "PRODUCT":{
466
+ "p":0.3793103448,
467
+ "r":0.2244897959,
468
+ "f":0.2820512821
469
+ },
470
+ "LAW":{
471
+ "p":0.3571428571,
472
+ "r":0.25,
473
+ "f":0.2941176471
474
+ },
475
+ "LANGUAGE":{
476
+ "p":0.4666666667,
477
+ "r":0.7777777778,
478
+ "f":0.5833333333
479
+ }
480
+ },
481
+ "speed":2677.6055974261
482
+ },
483
+ "sources":[
484
+ {
485
+ "name":"OntoNotes 5",
486
+ "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
487
+ "license":"commercial (licensed by Explosion)",
488
+ "author":"Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston"
489
+ },
490
+ {
491
+ "name":"CoreNLP Universal Dependencies Converter",
492
+ "url":"https://nlp.stanford.edu/software/stanford-dependencies.html",
493
+ "author":"Stanford NLP Group",
494
+ "license":"Citation provided for reference, no code packaged with model"
495
+ },
496
+ {
497
+ "name":"bert-base-chinese",
498
+ "author":"Hugging Face",
499
+ "url":"https://huggingface.co/bert-base-chinese",
500
+ "license":""
501
+ }
502
+ ],
503
+ "requirements":[
504
+ "spacy-curated-transformers>=0.2.0,<0.3.0",
505
+ "spacy-pkuseg>=0.0.27,<0.1.0"
506
+ ]
507
+ }
ner/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":1,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
ner/model ADDED
Binary file (314 kB). View file
 
ner/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{},"1":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"2":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"3":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"4":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336,"":1},"5":{"":1}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
parser/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":30,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
parser/model ADDED
Binary file (460 kB). View file
 
parser/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{"":436297},"1":{"":282750},"2":{"advmod":61142,"nsubj":55539,"compound:nn":45994,"dep":43937,"punct":36396,"case":24751,"nmod:assmod":22308,"nmod:prep":21037,"amod":18609,"acl":12438,"conj":10993,"det":10371,"nummod":9922,"cop":9515,"cc":6289,"aux:modal":6003,"neg":5955,"nmod:tmod":5338,"nmod":5049,"xcomp":4333,"appos":2988,"nmod:topic":2532,"discourse":2283,"advmod:loc":1902,"aux:prtmod":1724,"aux:ba":1323,"auxpass":1240,"advmod:dvp":1193,"name":1117,"advcl:loc":1072,"compound:vc":834,"nmod:poss":657,"amod:ordmod":601,"dobj":441,"nsubjpass":276,"nsubj:xsubj||ccomp":64,"parataxis:prnmod":36,"nsubj:xsubj":32},"3":{"punct":74587,"dobj":46958,"conj":31352,"case":31222,"dep":20953,"mark:clf":18377,"ccomp":17748,"mark":16793,"aux:asp":8130,"discourse":4187,"advmod:rcomp":2519,"nmod:range":2021,"cc":1715,"nmod:prep":1690,"advmod":1162,"etc":943,"compound:vc":828,"parataxis:prnmod":724,"advmod:loc":571,"neg":70,"acl":43,"advcl:loc":42},"4":{"ROOT":36097}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
tagger/cfg ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "label_smoothing":0.0,
3
+ "labels":[
4
+ "AD",
5
+ "AS",
6
+ "BA",
7
+ "CC",
8
+ "CD",
9
+ "CS",
10
+ "DEC",
11
+ "DEG",
12
+ "DER",
13
+ "DEV",
14
+ "DT",
15
+ "ETC",
16
+ "FW",
17
+ "IJ",
18
+ "INF",
19
+ "JJ",
20
+ "LB",
21
+ "LC",
22
+ "M",
23
+ "MSP",
24
+ "NN",
25
+ "NR",
26
+ "NT",
27
+ "OD",
28
+ "ON",
29
+ "P",
30
+ "PN",
31
+ "PU",
32
+ "SB",
33
+ "SP",
34
+ "URL",
35
+ "VA",
36
+ "VC",
37
+ "VE",
38
+ "VV",
39
+ "X"
40
+ ],
41
+ "neg_prefix":"!",
42
+ "overwrite":false
43
+ }
tagger/model ADDED
Binary file (111 kB). View file
 
tokenizer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "segmenter":"pkuseg"
3
+ }
tokenizer/pkuseg_model/features.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd4322482a7018b9bce9216173ae9d2848efe6d310b468bbb4383fb55c874a18
3
+ size 22685181
tokenizer/pkuseg_model/weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ada075eb25a854f71d6e6fa4e7d55e7be0ae049255b1f8f19d05c13b1b68c9e
3
+ size 37508754
tokenizer/pkuseg_processors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdc3129ffe89371aeaf4abacde2b4f00f5e0ff3cae022e937b14c1ed2b54879e
3
+ size 4527029
transformer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
transformer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d464d0cbb268568bf15914fa84f9feec3ee5445968ecf80b8b7ef87f11129f4
3
+ size 406876695
vocab/key2row ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½
vocab/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab/vectors ADDED
Binary file (128 Bytes). View file
 
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }
zh_core_web_trf-any-py3-none-any.whl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c68de47bebdf59de2489ce657e1398fe211c5a1dad0625a3c4b49436f1b45fa
3
+ size 415130114