diyclassics
commited on
Commit
•
c74d2a2
1
Parent(s):
74f9469
Update spaCy pipeline
Browse files- README.md +33 -33
- config.cfg +24 -4
- functions.py +1 -4
- la_core_web_trf-any-py3-none-any.whl +2 -2
- meta.json +216 -211
- morphologizer/model +1 -1
- ner/model +1 -1
- ner/moves +1 -1
- parser/model +1 -1
- senter/cfg +3 -0
- senter/model +0 -0
- tagger/model +0 -0
- trainable_lemmatizer/cfg +0 -0
- trainable_lemmatizer/model +2 -2
- trainable_lemmatizer/trees +2 -2
- transformer/model +1 -1
- vocab/strings.json +0 -0
README.md
CHANGED
@@ -14,72 +14,72 @@ model-index:
|
|
14 |
metrics:
|
15 |
- name: NER Precision
|
16 |
type: precision
|
17 |
-
value: 0.
|
18 |
- name: NER Recall
|
19 |
type: recall
|
20 |
-
value: 0.
|
21 |
- name: NER F Score
|
22 |
type: f_score
|
23 |
-
value: 0.
|
24 |
- task:
|
25 |
name: TAG
|
26 |
type: token-classification
|
27 |
metrics:
|
28 |
- name: TAG (XPOS) Accuracy
|
29 |
type: accuracy
|
30 |
-
value: 0.
|
31 |
- task:
|
32 |
name: POS
|
33 |
type: token-classification
|
34 |
metrics:
|
35 |
- name: POS (UPOS) Accuracy
|
36 |
type: accuracy
|
37 |
-
value: 0.
|
38 |
- task:
|
39 |
name: MORPH
|
40 |
type: token-classification
|
41 |
metrics:
|
42 |
- name: Morph (UFeats) Accuracy
|
43 |
type: accuracy
|
44 |
-
value: 0.
|
45 |
- task:
|
46 |
name: LEMMA
|
47 |
type: token-classification
|
48 |
metrics:
|
49 |
- name: Lemma Accuracy
|
50 |
type: accuracy
|
51 |
-
value: 0.
|
52 |
- task:
|
53 |
name: UNLABELED_DEPENDENCIES
|
54 |
type: token-classification
|
55 |
metrics:
|
56 |
- name: Unlabeled Attachment Score (UAS)
|
57 |
type: f_score
|
58 |
-
value: 0.
|
59 |
- task:
|
60 |
name: LABELED_DEPENDENCIES
|
61 |
type: token-classification
|
62 |
metrics:
|
63 |
- name: Labeled Attachment Score (LAS)
|
64 |
type: f_score
|
65 |
-
value: 0.
|
66 |
- task:
|
67 |
name: SENTS
|
68 |
type: token-classification
|
69 |
metrics:
|
70 |
- name: Sentences F-Score
|
71 |
type: f_score
|
72 |
-
value: 0.
|
73 |
---
|
74 |
| Feature | Description |
|
75 |
| --- | --- |
|
76 |
| **Name** | `la_core_web_trf` |
|
77 |
-
| **Version** | `3.7.
|
78 |
-
| **spaCy** | `>=3.7.
|
79 |
-
| **Default Pipeline** | `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
|
80 |
-
| **Components** | `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
|
81 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
82 |
-
| **Sources** | UD_Latin-Perseus<br>UD_Latin-PROIEL<br>UD_Latin-ITTB<br>UD_Latin-LLCT<br>UD_Latin-UDante |
|
83 |
| **License** | `MIT` |
|
84 |
| **Author** | [Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]](https://diyclassics.github.io/) |
|
85 |
|
@@ -102,21 +102,21 @@ model-index:
|
|
102 |
|
103 |
| Type | Score |
|
104 |
| --- | --- |
|
105 |
-
| `ENTS_F` |
|
106 |
-
| `ENTS_P` | 94.
|
107 |
-
| `ENTS_R` | 95.
|
108 |
-
| `TRANSFORMER_LOSS` |
|
109 |
-
| `NER_LOSS` |
|
110 |
-
| `TAG_ACC` | 96.
|
111 |
-
| `POS_ACC` | 98.
|
112 |
-
| `MORPH_ACC` | 95.
|
113 |
-
| `LEMMA_ACC` | 95.
|
114 |
-
| `DEP_UAS` | 88.
|
115 |
-
| `DEP_LAS` | 84.
|
116 |
-
| `SENTS_P` | 94.
|
117 |
-
| `SENTS_R` | 94.
|
118 |
-
| `SENTS_F` | 94.
|
119 |
-
| `TAGGER_LOSS` |
|
120 |
-
| `MORPHOLOGIZER_LOSS` |
|
121 |
-
| `TRAINABLE_LEMMATIZER_LOSS` |
|
122 |
-
| `PARSER_LOSS` |
|
|
|
14 |
metrics:
|
15 |
- name: NER Precision
|
16 |
type: precision
|
17 |
+
value: 0.9631931948
|
18 |
- name: NER Recall
|
19 |
type: recall
|
20 |
+
value: 0.9570871261
|
21 |
- name: NER F Score
|
22 |
type: f_score
|
23 |
+
value: 0.9601304525
|
24 |
- task:
|
25 |
name: TAG
|
26 |
type: token-classification
|
27 |
metrics:
|
28 |
- name: TAG (XPOS) Accuracy
|
29 |
type: accuracy
|
30 |
+
value: 0.9640958514
|
31 |
- task:
|
32 |
name: POS
|
33 |
type: token-classification
|
34 |
metrics:
|
35 |
- name: POS (UPOS) Accuracy
|
36 |
type: accuracy
|
37 |
+
value: 0.9831838987
|
38 |
- task:
|
39 |
name: MORPH
|
40 |
type: token-classification
|
41 |
metrics:
|
42 |
- name: Morph (UFeats) Accuracy
|
43 |
type: accuracy
|
44 |
+
value: 0.9581374663
|
45 |
- task:
|
46 |
name: LEMMA
|
47 |
type: token-classification
|
48 |
metrics:
|
49 |
- name: Lemma Accuracy
|
50 |
type: accuracy
|
51 |
+
value: 0.9531809911
|
52 |
- task:
|
53 |
name: UNLABELED_DEPENDENCIES
|
54 |
type: token-classification
|
55 |
metrics:
|
56 |
- name: Unlabeled Attachment Score (UAS)
|
57 |
type: f_score
|
58 |
+
value: 0.8882308136
|
59 |
- task:
|
60 |
name: LABELED_DEPENDENCIES
|
61 |
type: token-classification
|
62 |
metrics:
|
63 |
- name: Labeled Attachment Score (LAS)
|
64 |
type: f_score
|
65 |
+
value: 0.8492401865
|
66 |
- task:
|
67 |
name: SENTS
|
68 |
type: token-classification
|
69 |
metrics:
|
70 |
- name: Sentences F-Score
|
71 |
type: f_score
|
72 |
+
value: 0.9959496442
|
73 |
---
|
74 |
| Feature | Description |
|
75 |
| --- | --- |
|
76 |
| **Name** | `la_core_web_trf` |
|
77 |
+
| **Version** | `3.7.7` |
|
78 |
+
| **spaCy** | `>=3.7.5,<3.8.0` |
|
79 |
+
| **Default Pipeline** | `senter`, `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
|
80 |
+
| **Components** | `senter`, `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
|
81 |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
|
82 |
+
| **Sources** | UD_Latin-Perseus (via Gamba/Zeman 2023)<br>UD_Latin-PROIEL (via Gamba/Zeman 2023)<br>UD_Latin-ITTB (via Gamba/Zeman 2023)<br>UD_Latin-LLCT (via Gamba/Zeman 2023<br>UD_Latin-UDante (via Gamba/Zeman 2023)<br>CIRCSE/LASLA: LASLA Corpus |
|
83 |
| **License** | `MIT` |
|
84 |
| **Author** | [Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]](https://diyclassics.github.io/) |
|
85 |
|
|
|
102 |
|
103 |
| Type | Score |
|
104 |
| --- | --- |
|
105 |
+
| `ENTS_F` | 95.43 |
|
106 |
+
| `ENTS_P` | 94.87 |
|
107 |
+
| `ENTS_R` | 95.99 |
|
108 |
+
| `TRANSFORMER_LOSS` | 3054585.96 |
|
109 |
+
| `NER_LOSS` | 9051.00 |
|
110 |
+
| `TAG_ACC` | 96.33 |
|
111 |
+
| `POS_ACC` | 98.31 |
|
112 |
+
| `MORPH_ACC` | 95.75 |
|
113 |
+
| `LEMMA_ACC` | 95.27 |
|
114 |
+
| `DEP_UAS` | 88.61 |
|
115 |
+
| `DEP_LAS` | 84.72 |
|
116 |
+
| `SENTS_P` | 94.93 |
|
117 |
+
| `SENTS_R` | 94.41 |
|
118 |
+
| `SENTS_F` | 94.67 |
|
119 |
+
| `TAGGER_LOSS` | 28864.85 |
|
120 |
+
| `MORPHOLOGIZER_LOSS` | 246784.11 |
|
121 |
+
| `TRAINABLE_LEMMATIZER_LOSS` | 242230.83 |
|
122 |
+
| `PARSER_LOSS` | 2414158.66 |
|
config.cfg
CHANGED
@@ -10,7 +10,7 @@ seed = 0
|
|
10 |
|
11 |
[nlp]
|
12 |
lang = "la"
|
13 |
-
pipeline = ["transformer","normer","tagger","morphologizer","trainable_lemmatizer","parser","lookup_lemmatizer","ner"]
|
14 |
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
|
15 |
disabled = []
|
16 |
before_creation = null
|
@@ -129,6 +129,26 @@ use_fast = true
|
|
129 |
|
130 |
[components.parser.model.tok2vec.transformer_config]
|
131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
[components.tagger]
|
133 |
factory = "tagger"
|
134 |
label_smoothing = 0.0
|
@@ -251,6 +271,9 @@ eps = 0.00000001
|
|
251 |
learn_rate = 0.001
|
252 |
|
253 |
[training.score_weights]
|
|
|
|
|
|
|
254 |
tag_acc = 0.2
|
255 |
pos_acc = 0.1
|
256 |
morph_acc = 0.1
|
@@ -259,9 +282,6 @@ lemma_acc = 0.2
|
|
259 |
dep_uas = 0.1
|
260 |
dep_las = 0.1
|
261 |
dep_las_per_type = null
|
262 |
-
sents_p = null
|
263 |
-
sents_r = null
|
264 |
-
sents_f = 0.0
|
265 |
ents_f = 0.2
|
266 |
ents_p = 0.0
|
267 |
ents_r = 0.0
|
|
|
10 |
|
11 |
[nlp]
|
12 |
lang = "la"
|
13 |
+
pipeline = ["senter","transformer","normer","tagger","morphologizer","trainable_lemmatizer","parser","lookup_lemmatizer","ner"]
|
14 |
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
|
15 |
disabled = []
|
16 |
before_creation = null
|
|
|
129 |
|
130 |
[components.parser.model.tok2vec.transformer_config]
|
131 |
|
132 |
+
[components.senter]
|
133 |
+
factory = "senter"
|
134 |
+
overwrite = false
|
135 |
+
scorer = {"@scorers":"spacy.senter_scorer.v1"}
|
136 |
+
|
137 |
+
[components.senter.model]
|
138 |
+
@architectures = "spacy.Tagger.v2"
|
139 |
+
nO = null
|
140 |
+
normalize = false
|
141 |
+
|
142 |
+
[components.senter.model.tok2vec]
|
143 |
+
@architectures = "spacy.HashEmbedCNN.v2"
|
144 |
+
pretrained_vectors = null
|
145 |
+
width = 12
|
146 |
+
depth = 1
|
147 |
+
embed_size = 2000
|
148 |
+
window_size = 1
|
149 |
+
maxout_pieces = 2
|
150 |
+
subword_features = true
|
151 |
+
|
152 |
[components.tagger]
|
153 |
factory = "tagger"
|
154 |
label_smoothing = 0.0
|
|
|
271 |
learn_rate = 0.001
|
272 |
|
273 |
[training.score_weights]
|
274 |
+
sents_f = 0.0
|
275 |
+
sents_p = null
|
276 |
+
sents_r = null
|
277 |
tag_acc = 0.2
|
278 |
pos_acc = 0.1
|
279 |
morph_acc = 0.1
|
|
|
282 |
dep_uas = 0.1
|
283 |
dep_las = 0.1
|
284 |
dep_las_per_type = null
|
|
|
|
|
|
|
285 |
ents_f = 0.2
|
286 |
ents_p = 0.0
|
287 |
ents_r = 0.0
|
functions.py
CHANGED
@@ -200,11 +200,8 @@ import string
|
|
200 |
blank_nlp = spacy.blank("la")
|
201 |
lookups = Lookups()
|
202 |
|
203 |
-
try:
|
204 |
-
lookups_data = load_lookups(lang=blank_nlp.vocab.lang, tables=["lemma_lookup"])
|
205 |
-
except:
|
206 |
-
lookups_data = lookups.from_disk("scripts/lemmatizer_lookups")
|
207 |
|
|
|
208 |
LOOKUPS = lookups_data.get_table("lemma_lookup")
|
209 |
|
210 |
predicted_lemma_getter = lambda token: token.lemma_
|
|
|
200 |
blank_nlp = spacy.blank("la")
|
201 |
lookups = Lookups()
|
202 |
|
|
|
|
|
|
|
|
|
203 |
|
204 |
+
lookups_data = load_lookups(lang=blank_nlp.vocab.lang, tables=["lemma_lookup"])
|
205 |
LOOKUPS = lookups_data.get_table("lemma_lookup")
|
206 |
|
207 |
predicted_lemma_getter = lambda token: token.lemma_
|
la_core_web_trf-any-py3-none-any.whl
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:391c1897518a0049a9648a7ae67f931c778bec052dd7a5d98cb012cab57a9929
|
3 |
+
size 2509328343
|
meta.json
CHANGED
@@ -1,14 +1,14 @@
|
|
1 |
{
|
2 |
"lang":"la",
|
3 |
"name":"core_web_trf",
|
4 |
-
"version":"3.7.
|
5 |
"description":"",
|
6 |
"author":"Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]",
|
7 |
"email":"pjb311@nyu.edu",
|
8 |
"url":"https://diyclassics.github.io/",
|
9 |
"license":"MIT",
|
10 |
-
"spacy_version":">=3.7.
|
11 |
-
"spacy_git_version":"
|
12 |
"vectors":{
|
13 |
"width":0,
|
14 |
"vectors":0,
|
@@ -811,6 +811,7 @@
|
|
811 |
]
|
812 |
},
|
813 |
"pipeline":[
|
|
|
814 |
"transformer",
|
815 |
"normer",
|
816 |
"tagger",
|
@@ -821,6 +822,7 @@
|
|
821 |
"ner"
|
822 |
],
|
823 |
"components":[
|
|
|
824 |
"transformer",
|
825 |
"normer",
|
826 |
"tagger",
|
@@ -834,301 +836,304 @@
|
|
834 |
|
835 |
],
|
836 |
"performance":{
|
837 |
-
"ents_f":0.
|
838 |
-
"ents_p":0.
|
839 |
-
"ents_r":0.
|
840 |
"ents_per_type":{
|
841 |
"PERSON":{
|
842 |
-
"p":0.
|
843 |
-
"r":0.
|
844 |
-
"f":0.
|
845 |
},
|
846 |
"LOC":{
|
847 |
-
"p":0.
|
848 |
-
"r":0.
|
849 |
-
"f":0.
|
850 |
},
|
851 |
"NORP":{
|
852 |
-
"p":0
|
853 |
"r":0.9369369369,
|
854 |
-
"f":0.
|
855 |
}
|
856 |
},
|
857 |
-
"transformer_loss":
|
858 |
-
"ner_loss":
|
859 |
-
"
|
860 |
-
"
|
861 |
-
"
|
|
|
|
|
|
|
862 |
"morph_per_feat":{
|
863 |
"Case":{
|
864 |
-
"p":0.
|
865 |
-
"r":0.
|
866 |
-
"f":0.
|
867 |
},
|
868 |
"Gender":{
|
869 |
-
"p":0.
|
870 |
-
"r":0.
|
871 |
-
"f":0.
|
872 |
},
|
873 |
"Number":{
|
874 |
-
"p":0.
|
875 |
-
"r":0.
|
876 |
-
"f":0.
|
877 |
},
|
878 |
"Mood":{
|
879 |
-
"p":0.
|
880 |
-
"r":0.
|
881 |
-
"f":0.
|
882 |
},
|
883 |
"Person":{
|
884 |
-
"p":0.
|
885 |
-
"r":0.
|
886 |
-
"f":0.
|
887 |
},
|
888 |
"Tense":{
|
889 |
-
"p":0.
|
890 |
-
"r":0.
|
891 |
-
"f":0.
|
892 |
},
|
893 |
"VerbForm":{
|
894 |
-
"p":0.
|
895 |
-
"r":0.
|
896 |
-
"f":0.
|
897 |
},
|
898 |
"Voice":{
|
899 |
-
"p":0.
|
900 |
-
"r":0.
|
901 |
-
"f":0.
|
902 |
}
|
903 |
},
|
904 |
-
"lemma_acc":0.
|
905 |
-
"dep_uas":0.
|
906 |
-
"dep_las":0.
|
907 |
"dep_las_per_type":{
|
908 |
"root":{
|
909 |
-
"p":0.
|
910 |
-
"r":0.
|
911 |
-
"f":0.
|
912 |
},
|
913 |
"cop":{
|
914 |
-
"p":0.
|
915 |
-
"r":0.
|
916 |
-
"f":0.
|
917 |
},
|
918 |
"nsubj":{
|
919 |
-
"p":0.
|
920 |
-
"r":0.
|
921 |
-
"f":0.
|
922 |
},
|
923 |
"nmod":{
|
924 |
-
"p":0.
|
925 |
-
"r":0.
|
926 |
-
"f":0.
|
927 |
},
|
928 |
"obj":{
|
929 |
-
"p":0.
|
930 |
-
"r":0.
|
931 |
-
"f":0.
|
932 |
},
|
933 |
"det":{
|
934 |
-
"p":0.
|
935 |
-
"r":0.
|
936 |
-
"f":0.
|
937 |
},
|
938 |
"cc":{
|
939 |
-
"p":0.
|
940 |
-
"r":0.
|
941 |
-
"f":0.
|
942 |
},
|
943 |
"conj":{
|
944 |
-
"p":0.
|
945 |
-
"r":0.
|
946 |
-
"f":0.
|
947 |
},
|
948 |
"nummod":{
|
949 |
-
"p":0.
|
950 |
-
"r":0.
|
951 |
-
"f":0.
|
952 |
},
|
953 |
"case":{
|
954 |
-
"p":0.
|
955 |
-
"r":0.
|
956 |
-
"f":0.
|
957 |
},
|
958 |
"obl":{
|
959 |
-
"p":0.
|
960 |
-
"r":0.
|
961 |
-
"f":0.
|
962 |
},
|
963 |
"acl":{
|
964 |
-
"p":0.
|
965 |
-
"r":0.
|
966 |
-
"f":0.
|
967 |
},
|
968 |
"ccomp":{
|
969 |
-
"p":0.
|
970 |
-
"r":0.
|
971 |
-
"f":0.
|
972 |
},
|
973 |
"acl:relcl":{
|
974 |
-
"p":0.
|
975 |
-
"r":0.
|
976 |
-
"f":0.
|
977 |
},
|
978 |
"advmod":{
|
979 |
-
"p":0.
|
980 |
-
"r":0.
|
981 |
-
"f":0.
|
982 |
},
|
983 |
"mark":{
|
984 |
-
"p":0.
|
985 |
-
"r":0.
|
986 |
-
"f":0.
|
987 |
},
|
988 |
"xcomp":{
|
989 |
-
"p":0.
|
990 |
-
"r":0.
|
991 |
-
"f":0.
|
992 |
},
|
993 |
"csubj:pass":{
|
994 |
-
"p":0.
|
995 |
-
"r":0.
|
996 |
-
"f":0.
|
997 |
},
|
998 |
"advmod:lmod":{
|
999 |
-
"p":0.
|
1000 |
-
"r":0.
|
1001 |
-
"f":0.
|
1002 |
},
|
1003 |
"obl:arg":{
|
1004 |
-
"p":0.
|
1005 |
-
"r":0.
|
1006 |
-
"f":0.
|
1007 |
},
|
1008 |
"csubj":{
|
1009 |
-
"p":0.
|
1010 |
-
"r":0.
|
1011 |
-
"f":0.
|
1012 |
},
|
1013 |
"discourse":{
|
1014 |
-
"p":0.
|
1015 |
-
"r":0.
|
1016 |
-
"f":0.
|
1017 |
},
|
1018 |
"advcl":{
|
1019 |
-
"p":0.
|
1020 |
-
"r":0.
|
1021 |
-
"f":0.
|
1022 |
},
|
1023 |
"nsubj:pass":{
|
1024 |
-
"p":0.
|
1025 |
-
"r":0.
|
1026 |
-
"f":0.
|
1027 |
},
|
1028 |
"advmod:tmod":{
|
1029 |
-
"p":0.
|
1030 |
-
"r":0.
|
1031 |
-
"f":0.
|
1032 |
},
|
1033 |
"advmod:emph":{
|
1034 |
-
"p":0.
|
1035 |
-
"r":0.
|
1036 |
-
"f":0.
|
1037 |
},
|
1038 |
"amod":{
|
1039 |
-
"p":0.
|
1040 |
"r":0.8875675676,
|
1041 |
-
"f":0.
|
1042 |
},
|
1043 |
"conj:expl":{
|
1044 |
-
"p":0.
|
1045 |
-
"r":0.
|
1046 |
-
"f":0.
|
1047 |
},
|
1048 |
"advmod:neg":{
|
1049 |
-
"p":0.
|
1050 |
-
"r":0.
|
1051 |
-
"f":0.
|
1052 |
},
|
1053 |
"advcl:cmp":{
|
1054 |
-
"p":0.
|
1055 |
-
"r":0.
|
1056 |
-
"f":0.
|
1057 |
},
|
1058 |
"nsubj:outer":{
|
1059 |
-
"p":0.
|
1060 |
"r":0.3684210526,
|
1061 |
-
"f":0.
|
|
|
|
|
|
|
|
|
|
|
1062 |
},
|
1063 |
"advcl:abs":{
|
1064 |
-
"p":0.
|
1065 |
-
"r":0.
|
1066 |
-
"f":0.
|
1067 |
},
|
1068 |
"aux:pass":{
|
1069 |
-
"p":0.
|
1070 |
"r":0.9575551783,
|
1071 |
-
"f":0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1072 |
},
|
1073 |
"dep":{
|
1074 |
"p":0.0,
|
1075 |
"r":0.0,
|
1076 |
"f":0.0
|
1077 |
},
|
1078 |
-
"advcl:pred":{
|
1079 |
-
"p":0.3203883495,
|
1080 |
-
"r":0.1843575419,
|
1081 |
-
"f":0.2340425532
|
1082 |
-
},
|
1083 |
-
"orphan":{
|
1084 |
-
"p":0.5263157895,
|
1085 |
-
"r":0.3703703704,
|
1086 |
-
"f":0.4347826087
|
1087 |
-
},
|
1088 |
"aux":{
|
1089 |
-
"p":0.
|
1090 |
-
"r":0.
|
1091 |
-
"f":0.
|
1092 |
},
|
1093 |
"appos":{
|
1094 |
-
"p":0.
|
1095 |
-
"r":0.
|
1096 |
-
"f":0.
|
1097 |
-
},
|
1098 |
-
"fixed":{
|
1099 |
-
"p":0.955,
|
1100 |
-
"r":0.8842592593,
|
1101 |
-
"f":0.9182692308
|
1102 |
},
|
1103 |
"parataxis":{
|
1104 |
-
"p":0.
|
1105 |
-
"r":0.
|
1106 |
-
"f":0.
|
|
|
|
|
|
|
|
|
|
|
1107 |
},
|
1108 |
"flat":{
|
1109 |
-
"p":0.
|
1110 |
-
"r":0.
|
1111 |
-
"f":0.
|
1112 |
},
|
1113 |
"vocative":{
|
1114 |
-
"p":0.
|
1115 |
"r":0.5737704918,
|
1116 |
-
"f":0.
|
1117 |
-
},
|
1118 |
-
"dislocated":{
|
1119 |
-
"p":0.3333333333,
|
1120 |
-
"r":0.0967741935,
|
1121 |
-
"f":0.15
|
1122 |
},
|
1123 |
-
"
|
1124 |
-
"p":0.
|
1125 |
-
"r":0.
|
1126 |
-
"f":0.
|
1127 |
},
|
1128 |
"reparandum":{
|
1129 |
-
"p":0.
|
1130 |
"r":0.0833333333,
|
1131 |
-
"f":0.
|
1132 |
},
|
1133 |
"dislocated:nsubj":{
|
1134 |
"p":0.0,
|
@@ -1140,35 +1145,35 @@
|
|
1140 |
"r":0.0,
|
1141 |
"f":0.0
|
1142 |
},
|
|
|
|
|
|
|
|
|
|
|
1143 |
"obl:agent":{
|
1144 |
-
"p":0.
|
1145 |
-
"r":0.
|
1146 |
-
"f":0.
|
1147 |
},
|
1148 |
"flat:name":{
|
1149 |
-
"p":0.
|
1150 |
"r":0.8965517241,
|
1151 |
-
"f":0.
|
1152 |
-
},
|
1153 |
-
"flat:foreign":{
|
1154 |
-
"p":0.375,
|
1155 |
-
"r":1.0,
|
1156 |
-
"f":0.5454545455
|
1157 |
},
|
1158 |
"obl:tmod":{
|
1159 |
-
"p":0.
|
1160 |
"r":0.125,
|
1161 |
-
"f":0.
|
1162 |
},
|
1163 |
-
"
|
1164 |
"p":0.0,
|
1165 |
"r":0.0,
|
1166 |
"f":0.0
|
1167 |
},
|
1168 |
-
"
|
1169 |
-
"p":0.
|
1170 |
-
"r":0.
|
1171 |
-
"f":0.
|
1172 |
},
|
1173 |
"parataxis:reporting":{
|
1174 |
"p":0.0,
|
@@ -1206,20 +1211,20 @@
|
|
1206 |
"f":0.0
|
1207 |
}
|
1208 |
},
|
1209 |
-
"
|
1210 |
-
"
|
1211 |
-
"
|
1212 |
-
"
|
1213 |
-
"
|
1214 |
-
"trainable_lemmatizer_loss":3499.3652743734,
|
1215 |
-
"parser_loss":27590.0109430107
|
1216 |
},
|
1217 |
"sources":[
|
1218 |
-
"UD_Latin-Perseus",
|
1219 |
-
"UD_Latin-PROIEL",
|
1220 |
-
"UD_Latin-ITTB",
|
1221 |
-
"UD_Latin-LLCT",
|
1222 |
-
"UD_Latin-UDante"
|
|
|
|
|
1223 |
],
|
1224 |
"requirements":[
|
1225 |
"spacy_lookups_data @ git+https://github.com/diyclassics/spacy-lookups-data.git#egg=spacy-lookups-data",
|
|
|
1 |
{
|
2 |
"lang":"la",
|
3 |
"name":"core_web_trf",
|
4 |
+
"version":"3.7.7",
|
5 |
"description":"",
|
6 |
"author":"Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]",
|
7 |
"email":"pjb311@nyu.edu",
|
8 |
"url":"https://diyclassics.github.io/",
|
9 |
"license":"MIT",
|
10 |
+
"spacy_version":">=3.7.5,<3.8.0",
|
11 |
+
"spacy_git_version":"a6d0fc360",
|
12 |
"vectors":{
|
13 |
"width":0,
|
14 |
"vectors":0,
|
|
|
811 |
]
|
812 |
},
|
813 |
"pipeline":[
|
814 |
+
"senter",
|
815 |
"transformer",
|
816 |
"normer",
|
817 |
"tagger",
|
|
|
822 |
"ner"
|
823 |
],
|
824 |
"components":[
|
825 |
+
"senter",
|
826 |
"transformer",
|
827 |
"normer",
|
828 |
"tagger",
|
|
|
836 |
|
837 |
],
|
838 |
"performance":{
|
839 |
+
"ents_f":0.9601304525,
|
840 |
+
"ents_p":0.9631931948,
|
841 |
+
"ents_r":0.9570871261,
|
842 |
"ents_per_type":{
|
843 |
"PERSON":{
|
844 |
+
"p":0.966012543,
|
845 |
+
"r":0.9727031982,
|
846 |
+
"f":0.9693463256
|
847 |
},
|
848 |
"LOC":{
|
849 |
+
"p":0.9465290807,
|
850 |
+
"r":0.8913427562,
|
851 |
+
"f":0.9181073703
|
852 |
},
|
853 |
"NORP":{
|
854 |
+
"p":1.0,
|
855 |
"r":0.9369369369,
|
856 |
+
"f":0.9674418605
|
857 |
}
|
858 |
},
|
859 |
+
"transformer_loss":11904.1139860171,
|
860 |
+
"ner_loss":56.6517711698,
|
861 |
+
"sents_f":0.9959496442,
|
862 |
+
"sents_p":0.9945343244,
|
863 |
+
"sents_r":0.997368998,
|
864 |
+
"tag_acc":0.9640958514,
|
865 |
+
"pos_acc":0.9831838987,
|
866 |
+
"morph_acc":0.9581374663,
|
867 |
"morph_per_feat":{
|
868 |
"Case":{
|
869 |
+
"p":0.9639107612,
|
870 |
+
"r":0.9600554205,
|
871 |
+
"f":0.9619792281
|
872 |
},
|
873 |
"Gender":{
|
874 |
+
"p":0.9704207442,
|
875 |
+
"r":0.9668658936,
|
876 |
+
"f":0.9686400574
|
877 |
},
|
878 |
"Number":{
|
879 |
+
"p":0.9882738621,
|
880 |
+
"r":0.985457441,
|
881 |
+
"f":0.9868636421
|
882 |
},
|
883 |
"Mood":{
|
884 |
+
"p":0.9832762836,
|
885 |
+
"r":0.9826035966,
|
886 |
+
"f":0.982939825
|
887 |
},
|
888 |
"Person":{
|
889 |
+
"p":0.9932344122,
|
890 |
+
"r":0.9924713836,
|
891 |
+
"f":0.9928527513
|
892 |
},
|
893 |
"Tense":{
|
894 |
+
"p":0.976521164,
|
895 |
+
"r":0.9763597289,
|
896 |
+
"f":0.9764404398
|
897 |
},
|
898 |
"VerbForm":{
|
899 |
+
"p":0.9860205033,
|
900 |
+
"r":0.9809271523,
|
901 |
+
"f":0.9834672333
|
902 |
},
|
903 |
"Voice":{
|
904 |
+
"p":0.9829952525,
|
905 |
+
"r":0.9858886676,
|
906 |
+
"f":0.984439834
|
907 |
}
|
908 |
},
|
909 |
+
"lemma_acc":0.9531809911,
|
910 |
+
"dep_uas":0.8882308136,
|
911 |
+
"dep_las":0.8492401865,
|
912 |
"dep_las_per_type":{
|
913 |
"root":{
|
914 |
+
"p":0.931788369,
|
915 |
+
"r":0.9344442008,
|
916 |
+
"f":0.9331143952
|
917 |
},
|
918 |
"cop":{
|
919 |
+
"p":0.8331784387,
|
920 |
+
"r":0.8542162935,
|
921 |
+
"f":0.8435662197
|
922 |
},
|
923 |
"nsubj":{
|
924 |
+
"p":0.8516872095,
|
925 |
+
"r":0.8516872095,
|
926 |
+
"f":0.8516872095
|
927 |
},
|
928 |
"nmod":{
|
929 |
+
"p":0.8462984724,
|
930 |
+
"r":0.8320240296,
|
931 |
+
"f":0.8391005476
|
932 |
},
|
933 |
"obj":{
|
934 |
+
"p":0.8521505376,
|
935 |
+
"r":0.8643490116,
|
936 |
+
"f":0.8582064298
|
937 |
},
|
938 |
"det":{
|
939 |
+
"p":0.9342518733,
|
940 |
+
"r":0.9319990354,
|
941 |
+
"f":0.9331240946
|
942 |
},
|
943 |
"cc":{
|
944 |
+
"p":0.9080487222,
|
945 |
+
"r":0.9205811138,
|
946 |
+
"f":0.9142719731
|
947 |
},
|
948 |
"conj":{
|
949 |
+
"p":0.7781139687,
|
950 |
+
"r":0.7745983936,
|
951 |
+
"f":0.7763522013
|
952 |
},
|
953 |
"nummod":{
|
954 |
+
"p":0.9216101695,
|
955 |
+
"r":0.935483871,
|
956 |
+
"f":0.9284951974
|
957 |
},
|
958 |
"case":{
|
959 |
+
"p":0.9727039178,
|
960 |
+
"r":0.9832819348,
|
961 |
+
"f":0.9779643232
|
962 |
},
|
963 |
"obl":{
|
964 |
+
"p":0.8034008407,
|
965 |
+
"r":0.8227352769,
|
966 |
+
"f":0.8129531174
|
967 |
},
|
968 |
"acl":{
|
969 |
+
"p":0.7437788018,
|
970 |
+
"r":0.6879795396,
|
971 |
+
"f":0.7147918512
|
972 |
},
|
973 |
"ccomp":{
|
974 |
+
"p":0.6428571429,
|
975 |
+
"r":0.6331658291,
|
976 |
+
"f":0.6379746835
|
977 |
},
|
978 |
"acl:relcl":{
|
979 |
+
"p":0.7105263158,
|
980 |
+
"r":0.700843608,
|
981 |
+
"f":0.7056517478
|
982 |
},
|
983 |
"advmod":{
|
984 |
+
"p":0.7999080037,
|
985 |
+
"r":0.8080855019,
|
986 |
+
"f":0.8039759593
|
987 |
},
|
988 |
"mark":{
|
989 |
+
"p":0.8895979579,
|
990 |
+
"r":0.8704339682,
|
991 |
+
"f":0.8799116301
|
992 |
},
|
993 |
"xcomp":{
|
994 |
+
"p":0.795045045,
|
995 |
+
"r":0.7998489426,
|
996 |
+
"f":0.797439759
|
997 |
},
|
998 |
"csubj:pass":{
|
999 |
+
"p":0.7127659574,
|
1000 |
+
"r":0.6380952381,
|
1001 |
+
"f":0.6733668342
|
1002 |
},
|
1003 |
"advmod:lmod":{
|
1004 |
+
"p":0.9055944056,
|
1005 |
+
"r":0.8839590444,
|
1006 |
+
"f":0.8946459413
|
1007 |
},
|
1008 |
"obl:arg":{
|
1009 |
+
"p":0.8297546012,
|
1010 |
+
"r":0.8098802395,
|
1011 |
+
"f":0.8196969697
|
1012 |
},
|
1013 |
"csubj":{
|
1014 |
+
"p":0.7437106918,
|
1015 |
+
"r":0.7413793103,
|
1016 |
+
"f":0.7425431711
|
1017 |
},
|
1018 |
"discourse":{
|
1019 |
+
"p":0.8915662651,
|
1020 |
+
"r":0.8921319797,
|
1021 |
+
"f":0.8918490327
|
1022 |
},
|
1023 |
"advcl":{
|
1024 |
+
"p":0.6983372922,
|
1025 |
+
"r":0.7191780822,
|
1026 |
+
"f":0.708604483
|
1027 |
},
|
1028 |
"nsubj:pass":{
|
1029 |
+
"p":0.8316205534,
|
1030 |
+
"r":0.8449799197,
|
1031 |
+
"f":0.838247012
|
1032 |
},
|
1033 |
"advmod:tmod":{
|
1034 |
+
"p":0.7633928571,
|
1035 |
+
"r":0.7844036697,
|
1036 |
+
"f":0.7737556561
|
1037 |
},
|
1038 |
"advmod:emph":{
|
1039 |
+
"p":0.72,
|
1040 |
+
"r":0.69,
|
1041 |
+
"f":0.7046808511
|
1042 |
},
|
1043 |
"amod":{
|
1044 |
+
"p":0.8834289813,
|
1045 |
"r":0.8875675676,
|
1046 |
+
"f":0.8854934388
|
1047 |
},
|
1048 |
"conj:expl":{
|
1049 |
+
"p":0.4909090909,
|
1050 |
+
"r":0.2903225806,
|
1051 |
+
"f":0.3648648649
|
1052 |
},
|
1053 |
"advmod:neg":{
|
1054 |
+
"p":0.8902554399,
|
1055 |
+
"r":0.8961904762,
|
1056 |
+
"f":0.8932130992
|
1057 |
},
|
1058 |
"advcl:cmp":{
|
1059 |
+
"p":0.6553191489,
|
1060 |
+
"r":0.6184738956,
|
1061 |
+
"f":0.6363636364
|
1062 |
},
|
1063 |
"nsubj:outer":{
|
1064 |
+
"p":0.35,
|
1065 |
"r":0.3684210526,
|
1066 |
+
"f":0.358974359
|
1067 |
+
},
|
1068 |
+
"advcl:pred":{
|
1069 |
+
"p":0.329787234,
|
1070 |
+
"r":0.1731843575,
|
1071 |
+
"f":0.2271062271
|
1072 |
},
|
1073 |
"advcl:abs":{
|
1074 |
+
"p":0.8535911602,
|
1075 |
+
"r":0.860724234,
|
1076 |
+
"f":0.8571428571
|
1077 |
},
|
1078 |
"aux:pass":{
|
1079 |
+
"p":0.94,
|
1080 |
"r":0.9575551783,
|
1081 |
+
"f":0.9486963835
|
1082 |
+
},
|
1083 |
+
"fixed":{
|
1084 |
+
"p":0.9532019704,
|
1085 |
+
"r":0.8958333333,
|
1086 |
+
"f":0.923627685
|
1087 |
+
},
|
1088 |
+
"orphan":{
|
1089 |
+
"p":0.5299145299,
|
1090 |
+
"r":0.3827160494,
|
1091 |
+
"f":0.4444444444
|
1092 |
},
|
1093 |
"dep":{
|
1094 |
"p":0.0,
|
1095 |
"r":0.0,
|
1096 |
"f":0.0
|
1097 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1098 |
"aux":{
|
1099 |
+
"p":0.8827586207,
|
1100 |
+
"r":0.9208633094,
|
1101 |
+
"f":0.9014084507
|
1102 |
},
|
1103 |
"appos":{
|
1104 |
+
"p":0.9301242236,
|
1105 |
+
"r":0.8946975355,
|
1106 |
+
"f":0.9120669966
|
|
|
|
|
|
|
|
|
|
|
1107 |
},
|
1108 |
"parataxis":{
|
1109 |
+
"p":0.5,
|
1110 |
+
"r":0.3541666667,
|
1111 |
+
"f":0.4146341463
|
1112 |
+
},
|
1113 |
+
"dislocated:obj":{
|
1114 |
+
"p":0.7676767677,
|
1115 |
+
"r":0.7835051546,
|
1116 |
+
"f":0.7755102041
|
1117 |
},
|
1118 |
"flat":{
|
1119 |
+
"p":0.8504273504,
|
1120 |
+
"r":0.7713178295,
|
1121 |
+
"f":0.8089430894
|
1122 |
},
|
1123 |
"vocative":{
|
1124 |
+
"p":0.625,
|
1125 |
"r":0.5737704918,
|
1126 |
+
"f":0.5982905983
|
|
|
|
|
|
|
|
|
|
|
1127 |
},
|
1128 |
+
"ccomp:reported":{
|
1129 |
+
"p":0.2857142857,
|
1130 |
+
"r":0.4347826087,
|
1131 |
+
"f":0.3448275862
|
1132 |
},
|
1133 |
"reparandum":{
|
1134 |
+
"p":0.3333333333,
|
1135 |
"r":0.0833333333,
|
1136 |
+
"f":0.1333333333
|
1137 |
},
|
1138 |
"dislocated:nsubj":{
|
1139 |
"p":0.0,
|
|
|
1145 |
"r":0.0,
|
1146 |
"f":0.0
|
1147 |
},
|
1148 |
+
"dislocated":{
|
1149 |
+
"p":0.4444444444,
|
1150 |
+
"r":0.1290322581,
|
1151 |
+
"f":0.2
|
1152 |
+
},
|
1153 |
"obl:agent":{
|
1154 |
+
"p":0.52,
|
1155 |
+
"r":0.2574257426,
|
1156 |
+
"f":0.3443708609
|
1157 |
},
|
1158 |
"flat:name":{
|
1159 |
+
"p":0.6842105263,
|
1160 |
"r":0.8965517241,
|
1161 |
+
"f":0.776119403
|
|
|
|
|
|
|
|
|
|
|
1162 |
},
|
1163 |
"obl:tmod":{
|
1164 |
+
"p":0.25,
|
1165 |
"r":0.125,
|
1166 |
+
"f":0.1666666667
|
1167 |
},
|
1168 |
+
"flat:foreign":{
|
1169 |
"p":0.0,
|
1170 |
"r":0.0,
|
1171 |
"f":0.0
|
1172 |
},
|
1173 |
+
"obl:lmod":{
|
1174 |
+
"p":0.0,
|
1175 |
+
"r":0.0,
|
1176 |
+
"f":0.0
|
1177 |
},
|
1178 |
"parataxis:reporting":{
|
1179 |
"p":0.0,
|
|
|
1211 |
"f":0.0
|
1212 |
}
|
1213 |
},
|
1214 |
+
"senter_loss":149.3224464637,
|
1215 |
+
"tagger_loss":103.7340285726,
|
1216 |
+
"morphologizer_loss":1120.5138759464,
|
1217 |
+
"trainable_lemmatizer_loss":1068.4113499675,
|
1218 |
+
"parser_loss":19163.8495826822
|
|
|
|
|
1219 |
},
|
1220 |
"sources":[
|
1221 |
+
"UD_Latin-Perseus (via Gamba/Zeman 2023)",
|
1222 |
+
"UD_Latin-PROIEL (via Gamba/Zeman 2023)",
|
1223 |
+
"UD_Latin-ITTB (via Gamba/Zeman 2023)",
|
1224 |
+
"UD_Latin-LLCT (via Gamba/Zeman 2023",
|
1225 |
+
"UD_Latin-UDante (via Gamba/Zeman 2023)",
|
1226 |
+
"CIRCSE/LASLA: LASLA Corpus",
|
1227 |
+
"UD_Latin-CIRCSE"
|
1228 |
],
|
1229 |
"requirements":[
|
1230 |
"spacy_lookups_data @ git+https://github.com/diyclassics/spacy-lookups-data.git#egg=spacy-lookups-data",
|
morphologizer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 675109300
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4314e207681175f24999911fb6021e49163510932e816af12cb1ce5db54901d4
|
3 |
size 675109300
|
ner/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 673159757
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:418d4628460fe4adac04843c2da156ebbe59eee6624b18e20c32ebd9231704c8
|
3 |
size 673159757
|
ner/moves
CHANGED
@@ -1 +1 @@
|
|
1 |
-
��moves��{"0":{},"1":{"PERSON":
|
|
|
1 |
+
��moves��{"0":{},"1":{"PERSON":16680,"LOC":2845,"NORP":119},"2":{"PERSON":16680,"LOC":2845,"NORP":119},"3":{"PERSON":16680,"LOC":2845,"NORP":119},"4":{"PERSON":16680,"LOC":2845,"NORP":119,"":1},"5":{"":1}}�cfg��neg_key�
|
parser/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 676235346
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:603d1ed7d520cc49dd6dbda77f5734ebdaa1af716137d8b4413e2dd0a9f38ad6
|
3 |
size 676235346
|
senter/cfg
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"overwrite":false
|
3 |
+
}
|
senter/model
ADDED
Binary file (255 kB). View file
|
|
tagger/model
CHANGED
Binary files a/tagger/model and b/tagger/model differ
|
|
trainable_lemmatizer/cfg
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
trainable_lemmatizer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af6dfb059ad18026fdc3e92c427c082c551aef965740e70b9988b18722e750f3
|
3 |
+
size 14131797
|
trainable_lemmatizer/trees
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a15aefb70d9da8a30f7072f6609d8523a253cf54394eade11c4868ee7a09010b
|
3 |
+
size 949814
|
transformer/model
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 672940068
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b631c9775406d20af454f70054c364541d2ea0f5dccd1cc390296008d1ea6c0f
|
3 |
size 672940068
|
vocab/strings.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|