EC2 Default User commited on
Commit
182612e
1 Parent(s): c58aec4

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -19,3 +19,4 @@
19
  *strings.json filter=lfs diff=lfs merge=lfs -text
20
  vectors filter=lfs diff=lfs merge=lfs -text
21
  model filter=lfs diff=lfs merge=lfs -text
 
19
  *strings.json filter=lfs diff=lfs merge=lfs -text
20
  vectors filter=lfs diff=lfs merge=lfs -text
21
  model filter=lfs diff=lfs merge=lfs -text
22
+ *key2row filter=lfs diff=lfs merge=lfs -text
LICENSES_SOURCES CHANGED
@@ -878,557 +878,6 @@ Creative Commons may be contacted at creativecommons.org.
878
 
879
 
880
 
881
- # Lemmatization Lists
882
-
883
- * Author: Michal Měchura
884
- * URL: https://github.com/michmech/lemmatization-lists/
885
- * License: ODbL
886
-
887
- ```
888
- ## ODC Open Database License (ODbL)
889
-
890
- ### Preamble
891
-
892
- The Open Database License (ODbL) is a license agreement intended to
893
- allow users to freely share, modify, and use this Database while
894
- maintaining this same freedom for others. Many databases are covered by
895
- copyright, and therefore this document licenses these rights. Some
896
- jurisdictions, mainly in the European Union, have specific rights that
897
- cover databases, and so the ODbL addresses these rights, too. Finally,
898
- the ODbL is also an agreement in contract for users of this Database to
899
- act in certain ways in return for accessing this Database.
900
-
901
- Databases can contain a wide variety of types of content (images,
902
- audiovisual material, and sounds all in the same database, for example),
903
- and so the ODbL only governs the rights over the Database, and not the
904
- contents of the Database individually. Licensors should use the ODbL
905
- together with another license for the contents, if the contents have a
906
- single set of rights that uniformly covers all of the contents. If the
907
- contents have multiple sets of different rights, Licensors should
908
- describe what rights govern what contents together in the individual
909
- record or in some other way that clarifies what rights apply.
910
-
911
- Sometimes the contents of a database, or the database itself, can be
912
- covered by other rights not addressed here (such as private contracts,
913
- trade mark over the name, or privacy rights / data protection rights
914
- over information in the contents), and so you are advised that you may
915
- have to consult other documents or clear other rights before doing
916
- activities not covered by this License.
917
-
918
- ------
919
-
920
- The Licensor (as defined below)
921
-
922
- and
923
-
924
- You (as defined below)
925
-
926
- agree as follows:
927
-
928
- ### 1.0 Definitions of Capitalised Words
929
-
930
- "Collective Database" – Means this Database in unmodified form as part
931
- of a collection of independent databases in themselves that together are
932
- assembled into a collective whole. A work that constitutes a Collective
933
- Database will not be considered a Derivative Database.
934
-
935
- "Convey" – As a verb, means Using the Database, a Derivative Database,
936
- or the Database as part of a Collective Database in any way that enables
937
- a Person to make or receive copies of the Database or a Derivative
938
- Database. Conveying does not include interaction with a user through a
939
- computer network, or creating and Using a Produced Work, where no
940
- transfer of a copy of the Database or a Derivative Database occurs.
941
- "Contents" – The contents of this Database, which includes the
942
- information, independent works, or other material collected into the
943
- Database. For example, the contents of the Database could be factual
944
- data or works such as images, audiovisual material, text, or sounds.
945
-
946
- "Database" – A collection of material (the Contents) arranged in a
947
- systematic or methodical way and individually accessible by electronic
948
- or other means offered under the terms of this License.
949
-
950
- "Database Directive" – Means Directive 96/9/EC of the European
951
- Parliament and of the Council of 11 March 1996 on the legal protection
952
- of databases, as amended or succeeded.
953
-
954
- "Database Right" – Means rights resulting from the Chapter III ("sui
955
- generis") rights in the Database Directive (as amended and as transposed
956
- by member states), which includes the Extraction and Re-utilisation of
957
- the whole or a Substantial part of the Contents, as well as any similar
958
- rights available in the relevant jurisdiction under Section 10.4.
959
-
960
- "Derivative Database" – Means a database based upon the Database, and
961
- includes any translation, adaptation, arrangement, modification, or any
962
- other alteration of the Database or of a Substantial part of the
963
- Contents. This includes, but is not limited to, Extracting or
964
- Re-utilising the whole or a Substantial part of the Contents in a new
965
- Database.
966
-
967
- "Extraction" – Means the permanent or temporary transfer of all or a
968
- Substantial part of the Contents to another medium by any means or in
969
- any form.
970
-
971
- "License" – Means this license agreement and is both a license of rights
972
- such as copyright and Database Rights and an agreement in contract.
973
-
974
- "Licensor" – Means the Person that offers the Database under the terms
975
- of this License.
976
-
977
- "Person" – Means a natural or legal person or a body of persons
978
- corporate or incorporate.
979
-
980
- "Produced Work" – a work (such as an image, audiovisual material, text,
981
- or sounds) resulting from using the whole or a Substantial part of the
982
- Contents (via a search or other query) from this Database, a Derivative
983
- Database, or this Database as part of a Collective Database.
984
-
985
- "Publicly" – means to Persons other than You or under Your control by
986
- either more than 50% ownership or by the power to direct their
987
- activities (such as contracting with an independent consultant).
988
-
989
- "Re-utilisation" – means any form of making available to the public all
990
- or a Substantial part of the Contents by the distribution of copies, by
991
- renting, by online or other forms of transmission.
992
-
993
- "Substantial" – Means substantial in terms of quantity or quality or a
994
- combination of both. The repeated and systematic Extraction or
995
- Re-utilisation of insubstantial parts of the Contents may amount to the
996
- Extraction or Re-utilisation of a Substantial part of the Contents.
997
-
998
- "Use" – As a verb, means doing any act that is restricted by copyright
999
- or Database Rights whether in the original medium or any other; and
1000
- includes without limitation distributing, copying, publicly performing,
1001
- publicly displaying, and preparing derivative works of the Database, as
1002
- well as modifying the Database as may be technically necessary to use it
1003
- in a different mode or format.
1004
-
1005
- "You" – Means a Person exercising rights under this License who has not
1006
- previously violated the terms of this License with respect to the
1007
- Database, or who has received express permission from the Licensor to
1008
- exercise rights under this License despite a previous violation.
1009
-
1010
- Words in the singular include the plural and vice versa.
1011
-
1012
- ### 2.0 What this License covers
1013
-
1014
- 2.1. Legal effect of this document. This License is:
1015
-
1016
- a. A license of applicable copyright and neighbouring rights;
1017
-
1018
- b. A license of the Database Right; and
1019
-
1020
- c. An agreement in contract between You and the Licensor.
1021
-
1022
- 2.2 Legal rights covered. This License covers the legal rights in the
1023
- Database, including:
1024
-
1025
- a. Copyright. Any copyright or neighbouring rights in the Database.
1026
- The copyright licensed includes any individual elements of the
1027
- Database, but does not cover the copyright over the Contents
1028
- independent of this Database. See Section 2.4 for details. Copyright
1029
- law varies between jurisdictions, but is likely to cover: the Database
1030
- model or schema, which is the structure, arrangement, and organisation
1031
- of the Database, and can also include the Database tables and table
1032
- indexes; the data entry and output sheets; and the Field names of
1033
- Contents stored in the Database;
1034
-
1035
- b. Database Rights. Database Rights only extend to the Extraction and
1036
- Re-utilisation of the whole or a Substantial part of the Contents.
1037
- Database Rights can apply even when there is no copyright over the
1038
- Database. Database Rights can also apply when the Contents are removed
1039
- from the Database and are selected and arranged in a way that would
1040
- not infringe any applicable copyright; and
1041
-
1042
- c. Contract. This is an agreement between You and the Licensor for
1043
- access to the Database. In return you agree to certain conditions of
1044
- use on this access as outlined in this License.
1045
-
1046
- 2.3 Rights not covered.
1047
-
1048
- a. This License does not apply to computer programs used in the making
1049
- or operation of the Database;
1050
-
1051
- b. This License does not cover any patents over the Contents or the
1052
- Database; and
1053
-
1054
- c. This License does not cover any trademarks associated with the
1055
- Database.
1056
-
1057
- 2.4 Relationship to Contents in the Database. The individual items of
1058
- the Contents contained in this Database may be covered by other rights,
1059
- including copyright, patent, data protection, privacy, or personality
1060
- rights, and this License does not cover any rights (other than Database
1061
- Rights or in contract) in individual Contents contained in the Database.
1062
- For example, if used on a Database of images (the Contents), this
1063
- License would not apply to copyright over individual images, which could
1064
- have their own separate licenses, or one single license covering all of
1065
- the rights over the images.
1066
-
1067
- ### 3.0 Rights granted
1068
-
1069
- 3.1 Subject to the terms and conditions of this License, the Licensor
1070
- grants to You a worldwide, royalty-free, non-exclusive, terminable (but
1071
- only under Section 9) license to Use the Database for the duration of
1072
- any applicable copyright and Database Rights. These rights explicitly
1073
- include commercial use, and do not exclude any field of endeavour. To
1074
- the extent possible in the relevant jurisdiction, these rights may be
1075
- exercised in all media and formats whether now known or created in the
1076
- future.
1077
-
1078
- The rights granted cover, for example:
1079
-
1080
- a. Extraction and Re-utilisation of the whole or a Substantial part of
1081
- the Contents;
1082
-
1083
- b. Creation of Derivative Databases;
1084
-
1085
- c. Creation of Collective Databases;
1086
-
1087
- d. Creation of temporary or permanent reproductions by any means and
1088
- in any form, in whole or in part, including of any Derivative
1089
- Databases or as a part of Collective Databases; and
1090
-
1091
- e. Distribution, communication, display, lending, making available, or
1092
- performance to the public by any means and in any form, in whole or in
1093
- part, including of any Derivative Database or as a part of Collective
1094
- Databases.
1095
-
1096
- 3.2 Compulsory license schemes. For the avoidance of doubt:
1097
-
1098
- a. Non-waivable compulsory license schemes. In those jurisdictions in
1099
- which the right to collect royalties through any statutory or
1100
- compulsory licensing scheme cannot be waived, the Licensor reserves
1101
- the exclusive right to collect such royalties for any exercise by You
1102
- of the rights granted under this License;
1103
-
1104
- b. Waivable compulsory license schemes. In those jurisdictions in
1105
- which the right to collect royalties through any statutory or
1106
- compulsory licensing scheme can be waived, the Licensor waives the
1107
- exclusive right to collect such royalties for any exercise by You of
1108
- the rights granted under this License; and,
1109
-
1110
- c. Voluntary license schemes. The Licensor waives the right to collect
1111
- royalties, whether individually or, in the event that the Licensor is
1112
- a member of a collecting society that administers voluntary licensing
1113
- schemes, via that society, from any exercise by You of the rights
1114
- granted under this License.
1115
-
1116
- 3.3 The right to release the Database under different terms, or to stop
1117
- distributing or making available the Database, is reserved. Note that
1118
- this Database may be multiple-licensed, and so You may have the choice
1119
- of using alternative licenses for this Database. Subject to Section
1120
- 10.4, all other rights not expressly granted by Licensor are reserved.
1121
-
1122
- ### 4.0 Conditions of Use
1123
-
1124
- 4.1 The rights granted in Section 3 above are expressly made subject to
1125
- Your complying with the following conditions of use. These are important
1126
- conditions of this License, and if You fail to follow them, You will be
1127
- in material breach of its terms.
1128
-
1129
- 4.2 Notices. If You Publicly Convey this Database, any Derivative
1130
- Database, or the Database as part of a Collective Database, then You
1131
- must:
1132
-
1133
- a. Do so only under the terms of this License or another license
1134
- permitted under Section 4.4;
1135
-
1136
- b. Include a copy of this License (or, as applicable, a license
1137
- permitted under Section 4.4) or its Uniform Resource Identifier (URI)
1138
- with the Database or Derivative Database, including both in the
1139
- Database or Derivative Database and in any relevant documentation; and
1140
-
1141
- c. Keep intact any copyright or Database Right notices and notices
1142
- that refer to this License.
1143
-
1144
- d. If it is not possible to put the required notices in a particular
1145
- file due to its structure, then You must include the notices in a
1146
- location (such as a relevant directory) where users would be likely to
1147
- look for it.
1148
-
1149
- 4.3 Notice for using output (Contents). Creating and Using a Produced
1150
- Work does not require the notice in Section 4.2. However, if you
1151
- Publicly Use a Produced Work, You must include a notice associated with
1152
- the Produced Work reasonably calculated to make any Person that uses,
1153
- views, accesses, interacts with, or is otherwise exposed to the Produced
1154
- Work aware that Content was obtained from the Database, Derivative
1155
- Database, or the Database as part of a Collective Database, and that it
1156
- is available under this License.
1157
-
1158
- a. Example notice. The following text will satisfy notice under
1159
- Section 4.3:
1160
-
1161
- Contains information from DATABASE NAME, which is made available
1162
- here under the Open Database License (ODbL).
1163
-
1164
- DATABASE NAME should be replaced with the name of the Database and a
1165
- hyperlink to the URI of the Database. "Open Database License" should
1166
- contain a hyperlink to the URI of the text of this License. If
1167
- hyperlinks are not possible, You should include the plain text of the
1168
- required URI's with the above notice.
1169
-
1170
- 4.4 Share alike.
1171
-
1172
- a. Any Derivative Database that You Publicly Use must be only under
1173
- the terms of:
1174
-
1175
- i. This License;
1176
-
1177
- ii. A later version of this License similar in spirit to this
1178
- License; or
1179
-
1180
- iii. A compatible license.
1181
-
1182
- If You license the Derivative Database under one of the licenses
1183
- mentioned in (iii), You must comply with the terms of that license.
1184
-
1185
- b. For the avoidance of doubt, Extraction or Re-utilisation of the
1186
- whole or a Substantial part of the Contents into a new database is a
1187
- Derivative Database and must comply with Section 4.4.
1188
-
1189
- c. Derivative Databases and Produced Works. A Derivative Database is
1190
- Publicly Used and so must comply with Section 4.4. if a Produced Work
1191
- created from the Derivative Database is Publicly Used.
1192
-
1193
- d. Share Alike and additional Contents. For the avoidance of doubt,
1194
- You must not add Contents to Derivative Databases under Section 4.4 a
1195
- that are incompatible with the rights granted under this License.
1196
-
1197
- e. Compatible licenses. Licensors may authorise a proxy to determine
1198
- compatible licenses under Section 4.4 a iii. If they do so, the
1199
- authorised proxy's public statement of acceptance of a compatible
1200
- license grants You permission to use the compatible license.
1201
-
1202
-
1203
- 4.5 Limits of Share Alike. The requirements of Section 4.4 do not apply
1204
- in the following:
1205
-
1206
- a. For the avoidance of doubt, You are not required to license
1207
- Collective Databases under this License if You incorporate this
1208
- Database or a Derivative Database in the collection, but this License
1209
- still applies to this Database or a Derivative Database as a part of
1210
- the Collective Database;
1211
-
1212
- b. Using this Database, a Derivative Database, or this Database as
1213
- part of a Collective Database to create a Produced Work does not
1214
- create a Derivative Database for purposes of Section 4.4; and
1215
-
1216
- c. Use of a Derivative Database internally within an organisation is
1217
- not to the public and therefore does not fall under the requirements
1218
- of Section 4.4.
1219
-
1220
- 4.6 Access to Derivative Databases. If You Publicly Use a Derivative
1221
- Database or a Produced Work from a Derivative Database, You must also
1222
- offer to recipients of the Derivative Database or Produced Work a copy
1223
- in a machine readable form of:
1224
-
1225
- a. The entire Derivative Database; or
1226
-
1227
- b. A file containing all of the alterations made to the Database or
1228
- the method of making the alterations to the Database (such as an
1229
- algorithm), including any additional Contents, that make up all the
1230
- differences between the Database and the Derivative Database.
1231
-
1232
- The Derivative Database (under a.) or alteration file (under b.) must be
1233
- available at no more than a reasonable production cost for physical
1234
- distributions and free of charge if distributed over the internet.
1235
-
1236
- 4.7 Technological measures and additional terms
1237
-
1238
- a. This License does not allow You to impose (except subject to
1239
- Section 4.7 b.) any terms or any technological measures on the
1240
- Database, a Derivative Database, or the whole or a Substantial part of
1241
- the Contents that alter or restrict the terms of this License, or any
1242
- rights granted under it, or have the effect or intent of restricting
1243
- the ability of any person to exercise those rights.
1244
-
1245
- b. Parallel distribution. You may impose terms or technological
1246
- measures on the Database, a Derivative Database, or the whole or a
1247
- Substantial part of the Contents (a "Restricted Database") in
1248
- contravention of Section 4.74 a. only if You also make a copy of the
1249
- Database or a Derivative Database available to the recipient of the
1250
- Restricted Database:
1251
-
1252
- i. That is available without additional fee;
1253
-
1254
- ii. That is available in a medium that does not alter or restrict
1255
- the terms of this License, or any rights granted under it, or have
1256
- the effect or intent of restricting the ability of any person to
1257
- exercise those rights (an "Unrestricted Database"); and
1258
-
1259
- iii. The Unrestricted Database is at least as accessible to the
1260
- recipient as a practical matter as the Restricted Database.
1261
-
1262
- c. For the avoidance of doubt, You may place this Database or a
1263
- Derivative Database in an authenticated environment, behind a
1264
- password, or within a similar access control scheme provided that You
1265
- do not alter or restrict the terms of this License or any rights
1266
- granted under it or have the effect or intent of restricting the
1267
- ability of any person to exercise those rights.
1268
-
1269
- 4.8 Licensing of others. You may not sublicense the Database. Each time
1270
- You communicate the Database, the whole or Substantial part of the
1271
- Contents, or any Derivative Database to anyone else in any way, the
1272
- Licensor offers to the recipient a license to the Database on the same
1273
- terms and conditions as this License. You are not responsible for
1274
- enforcing compliance by third parties with this License, but You may
1275
- enforce any rights that You have over a Derivative Database. You are
1276
- solely responsible for any modifications of a Derivative Database made
1277
- by You or another Person at Your direction. You may not impose any
1278
- further restrictions on the exercise of the rights granted or affirmed
1279
- under this License.
1280
-
1281
- ### 5.0 Moral rights
1282
-
1283
- 5.1 Moral rights. This section covers moral rights, including any rights
1284
- to be identified as the author of the Database or to object to treatment
1285
- that would otherwise prejudice the author's honour and reputation, or
1286
- any other derogatory treatment:
1287
-
1288
- a. For jurisdictions allowing waiver of moral rights, Licensor waives
1289
- all moral rights that Licensor may have in the Database to the fullest
1290
- extent possible by the law of the relevant jurisdiction under Section
1291
- 10.4;
1292
-
1293
- b. If waiver of moral rights under Section 5.1 a in the relevant
1294
- jurisdiction is not possible, Licensor agrees not to assert any moral
1295
- rights over the Database and waives all claims in moral rights to the
1296
- fullest extent possible by the law of the relevant jurisdiction under
1297
- Section 10.4; and
1298
-
1299
- c. For jurisdictions not allowing waiver or an agreement not to assert
1300
- moral rights under Section 5.1 a and b, the author may retain their
1301
- moral rights over certain aspects of the Database.
1302
-
1303
- Please note that some jurisdictions do not allow for the waiver of moral
1304
- rights, and so moral rights may still subsist over the Database in some
1305
- jurisdictions.
1306
-
1307
- ### 6.0 Fair dealing, Database exceptions, and other rights not affected
1308
-
1309
- 6.1 This License does not affect any rights that You or anyone else may
1310
- independently have under any applicable law to make any use of this
1311
- Database, including without limitation:
1312
-
1313
- a. Exceptions to the Database Right including: Extraction of Contents
1314
- from non-electronic Databases for private purposes, Extraction for
1315
- purposes of illustration for teaching or scientific research, and
1316
- Extraction or Re-utilisation for public security or an administrative
1317
- or judicial procedure.
1318
-
1319
- b. Fair dealing, fair use, or any other legally recognised limitation
1320
- or exception to infringement of copyright or other applicable laws.
1321
-
1322
- 6.2 This License does not affect any rights of lawful users to Extract
1323
- and Re-utilise insubstantial parts of the Contents, evaluated
1324
- quantitatively or qualitatively, for any purposes whatsoever, including
1325
- creating a Derivative Database (subject to other rights over the
1326
- Contents, see Section 2.4). The repeated and systematic Extraction or
1327
- Re-utilisation of insubstantial parts of the Contents may however amount
1328
- to the Extraction or Re-utilisation of a Substantial part of the
1329
- Contents.
1330
-
1331
- ### 7.0 Warranties and Disclaimer
1332
-
1333
- 7.1 The Database is licensed by the Licensor "as is" and without any
1334
- warranty of any kind, either express, implied, or arising by statute,
1335
- custom, course of dealing, or trade usage. Licensor specifically
1336
- disclaims any and all implied warranties or conditions of title,
1337
- non-infringement, accuracy or completeness, the presence or absence of
1338
- errors, fitness for a particular purpose, merchantability, or otherwise.
1339
- Some jurisdictions do not allow the exclusion of implied warranties, so
1340
- this exclusion may not apply to You.
1341
-
1342
- ### 8.0 Limitation of liability
1343
-
1344
- 8.1 Subject to any liability that may not be excluded or limited by law,
1345
- the Licensor is not liable for, and expressly excludes, all liability
1346
- for loss or damage however and whenever caused to anyone by any use
1347
- under this License, whether by You or by anyone else, and whether caused
1348
- by any fault on the part of the Licensor or not. This exclusion of
1349
- liability includes, but is not limited to, any special, incidental,
1350
- consequential, punitive, or exemplary damages such as loss of revenue,
1351
- data, anticipated profits, and lost business. This exclusion applies
1352
- even if the Licensor has been advised of the possibility of such
1353
- damages.
1354
-
1355
- 8.2 If liability may not be excluded by law, it is limited to actual and
1356
- direct financial loss to the extent it is caused by proved negligence on
1357
- the part of the Licensor.
1358
-
1359
- ### 9.0 Termination of Your rights under this License
1360
-
1361
- 9.1 Any breach by You of the terms and conditions of this License
1362
- automatically terminates this License with immediate effect and without
1363
- notice to You. For the avoidance of doubt, Persons who have received the
1364
- Database, the whole or a Substantial part of the Contents, Derivative
1365
- Databases, or the Database as part of a Collective Database from You
1366
- under this License will not have their licenses terminated provided
1367
- their use is in full compliance with this License or a license granted
1368
- under Section 4.8 of this License. Sections 1, 2, 7, 8, 9 and 10 will
1369
- survive any termination of this License.
1370
-
1371
- 9.2 If You are not in breach of the terms of this License, the Licensor
1372
- will not terminate Your rights under it.
1373
-
1374
- 9.3 Unless terminated under Section 9.1, this License is granted to You
1375
- for the duration of applicable rights in the Database.
1376
-
1377
- 9.4 Reinstatement of rights. If you cease any breach of the terms and
1378
- conditions of this License, then your full rights under this License
1379
- will be reinstated:
1380
-
1381
- a. Provisionally and subject to permanent termination until the 60th
1382
- day after cessation of breach;
1383
-
1384
- b. Permanently on the 60th day after cessation of breach unless
1385
- otherwise reasonably notified by the Licensor; or
1386
-
1387
- c. Permanently if reasonably notified by the Licensor of the
1388
- violation, this is the first time You have received notice of
1389
- violation of this License from the Licensor, and You cure the
1390
- violation prior to 30 days after your receipt of the notice.
1391
-
1392
- Persons subject to permanent termination of rights are not eligible to
1393
- be a recipient and receive a license under Section 4.8.
1394
-
1395
- 9.5 Notwithstanding the above, Licensor reserves the right to release
1396
- the Database under different license terms or to stop distributing or
1397
- making available the Database. Releasing the Database under different
1398
- license terms or stopping the distribution of the Database will not
1399
- withdraw this License (or any other license that has been, or is
1400
- required to be, granted under the terms of this License), and this
1401
- License will continue in full force and effect unless terminated as
1402
- stated above.
1403
-
1404
- ### 10.0 General
1405
-
1406
- 10.1 If any provision of this License is held to be invalid or
1407
- unenforceable, that must not affect the validity or enforceability of
1408
- the remainder of the terms and conditions of this License and each
1409
- remaining provision of this License shall be valid and enforced to the
1410
- fullest extent permitted by law.
1411
-
1412
- 10.2 This License is the entire agreement between the parties with
1413
- respect to the rights granted here over the Database. It replaces any
1414
- earlier understandings, agreements or representations with respect to
1415
- the Database.
1416
-
1417
- 10.3 If You are in breach of the terms of this License, You will not be
1418
- entitled to rely on the terms of this License or to complain of any
1419
- breach by the Licensor.
1420
-
1421
- 10.4 Choice of law. This License takes effect in and will be governed by
1422
- the laws of the relevant jurisdiction in which the License terms are
1423
- sought to be enforced. If the standard suite of rights granted under
1424
- applicable copyright law and Database Rights in the relevant
1425
- jurisdiction includes additional rights not granted under this License,
1426
- these additional rights are granted in this License in order to meet the
1427
- terms of this License.```
1428
-
1429
-
1430
-
1431
-
1432
  # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
1433
 
1434
  * Author: Explosion
878
 
879
 
880
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
881
  # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
882
 
883
  * Author: Explosion
README.md CHANGED
@@ -14,61 +14,76 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.8075313808
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8041666667
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.8058455115
 
 
 
 
 
 
 
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
- - name: POS Accuracy
29
  type: accuracy
30
- value: 0.962905569
31
  - task:
32
- name: SENTER
33
  type: token-classification
34
  metrics:
35
- - name: SENTER Precision
36
- type: precision
37
- value: 0.908438061
38
- - name: SENTER Recall
39
- type: recall
40
- value: 0.8971631206
41
- - name: SENTER F Score
42
- type: f_score
43
- value: 0.902765388
44
  - task:
45
- name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
- - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.8220632614
 
 
 
 
 
 
 
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
- - name: Labeled Dependencies Accuracy
56
- type: accuracy
57
- value: 0.8220632614
 
 
 
 
 
 
 
58
  ---
59
  ### Details: https://spacy.io/models/da#da_core_news_md
60
 
61
- Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.
62
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `da_core_news_md` |
66
- | **Version** | `3.2.0` |
67
- | **spaCy** | `>=3.2.0,<3.3.0` |
68
- | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
- | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 500000 keys, 20000 unique vectors (300 dimensions) |
71
- | **Sources** | [UD Danish DDT v2.8](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Lemmatization Lists](https://github.com/michmech/lemmatization-lists/) (Michal Měchura)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
@@ -76,13 +91,12 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
76
 
77
  <details>
78
 
79
- <summary>View label scheme (195 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
84
  | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
85
- | **`senter`** | `I`, `S` |
86
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
87
 
88
  </details>
@@ -95,18 +109,18 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
95
  | `TOKEN_P` | 99.78 |
96
  | `TOKEN_R` | 99.75 |
97
  | `TOKEN_F` | 99.76 |
98
- | `POS_ACC` | 96.29 |
99
- | `MORPH_ACC` | 94.88 |
100
- | `MORPH_MICRO_P` | 96.65 |
101
- | `MORPH_MICRO_R` | 96.11 |
102
- | `MORPH_MICRO_F` | 96.38 |
103
- | `SENTS_P` | 90.84 |
104
- | `SENTS_R` | 89.72 |
105
- | `SENTS_F` | 90.28 |
106
- | `DEP_UAS` | 82.21 |
107
- | `DEP_LAS` | 78.13 |
108
- | `TAG_ACC` | 96.29 |
109
- | `LEMMA_ACC` | 84.91 |
110
- | `ENTS_P` | 80.75 |
111
- | `ENTS_R` | 80.42 |
112
- | `ENTS_F` | 80.58 |
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.8079331942
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.80625
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8070907195
24
+ - task:
25
+ name: TAG
26
+ type: token-classification
27
+ metrics:
28
+ - name: TAG (XPOS) Accuracy
29
+ type: accuracy
30
+ value: 0.9641646489
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
+ - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9641646489
38
  - task:
39
+ name: MORPH
40
  type: token-classification
41
  metrics:
42
+ - name: Morph (UFeats) Accuracy
43
+ type: accuracy
44
+ value: 0.9505084746
 
 
 
 
 
 
45
  - task:
46
+ name: LEMMA
47
  type: token-classification
48
  metrics:
49
+ - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9481840194
52
+ - task:
53
+ name: UNLABELED_DEPENDENCIES
54
+ type: token-classification
55
+ metrics:
56
+ - name: Unlabeled Attachment Score (UAS)
57
+ type: f_score
58
+ value: 0.8130063132
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
+ - name: Labeled Attachment Score (LAS)
64
+ type: f_score
65
+ value: 0.7723336499
66
+ - task:
67
+ name: SENTS
68
+ type: token-classification
69
+ metrics:
70
+ - name: Sentences F-Score
71
+ type: f_score
72
+ value: 0.9112107623
73
  ---
74
  ### Details: https://spacy.io/models/da#da_core_news_md
75
 
76
+ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner, attribute_ruler.
77
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `da_core_news_md` |
81
+ | **Version** | `3.3.0` |
82
+ | **spaCy** | `>=3.3.0.dev0,<3.4.0` |
83
+ | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
84
+ | **Components** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `senter`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 500000 keys, 20000 unique vectors (300 dimensions) |
86
+ | **Sources** | [UD Danish DDT v2.8](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
87
  | **License** | `CC BY-SA 4.0` |
88
  | **Author** | [Explosion](https://explosion.ai) |
89
 
91
 
92
  <details>
93
 
94
+ <summary>View label scheme (193 labels for 3 components)</summary>
95
 
96
  | Component | Labels |
97
  | --- | --- |
98
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
99
  | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
 
100
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
101
 
102
  </details>
109
  | `TOKEN_P` | 99.78 |
110
  | `TOKEN_R` | 99.75 |
111
  | `TOKEN_F` | 99.76 |
112
+ | `POS_ACC` | 96.42 |
113
+ | `MORPH_ACC` | 95.05 |
114
+ | `MORPH_MICRO_P` | 96.69 |
115
+ | `MORPH_MICRO_R` | 96.16 |
116
+ | `MORPH_MICRO_F` | 96.42 |
117
+ | `SENTS_P` | 92.20 |
118
+ | `SENTS_R` | 90.07 |
119
+ | `SENTS_F` | 91.12 |
120
+ | `DEP_UAS` | 81.30 |
121
+ | `DEP_LAS` | 77.23 |
122
+ | `LEMMA_ACC` | 94.82 |
123
+ | `TAG_ACC` | 96.42 |
124
+ | `ENTS_P` | 80.79 |
125
+ | `ENTS_R` | 80.62 |
126
+ | `ENTS_F` | 80.71 |
accuracy.json CHANGED
@@ -3,51 +3,51 @@
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
- "pos_acc": 0.962905569,
7
- "morph_acc": 0.9487651332,
8
- "morph_micro_p": 0.9664501066,
9
- "morph_micro_r": 0.9611327041,
10
- "morph_micro_f": 0.9637840711,
11
  "morph_per_feat": {
12
  "Mood": {
13
- "p": 0.9789674952,
14
- "r": 0.9761677788,
15
- "f": 0.9775656325
16
  },
17
  "Tense": {
18
- "p": 0.9698340875,
19
  "r": 0.968373494,
20
- "f": 0.9691032404
21
  },
22
  "VerbForm": {
23
- "p": 0.9631449631,
24
- "r": 0.9596083231,
25
- "f": 0.9613733906
26
  },
27
  "Voice": {
28
- "p": 0.9782934132,
29
  "r": 0.9768310912,
30
- "f": 0.9775617053
31
  },
32
  "Definite": {
33
- "p": 0.9633027523,
34
- "r": 0.9541683129,
35
- "f": 0.9587137753
36
  },
37
  "Gender": {
38
- "p": 0.9459098497,
39
  "r": 0.9415088069,
40
- "f": 0.9437041972
41
  },
42
  "Number": {
43
- "p": 0.9618621778,
44
- "r": 0.9538341158,
45
- "f": 0.9578313253
46
  },
47
  "AdpType": {
48
- "p": 0.9973333333,
49
  "r": 0.9920424403,
50
- "f": 0.9946808511
51
  },
52
  "PartType": {
53
  "p": 1.0,
@@ -55,29 +55,29 @@
55
  "f": 1.0
56
  },
57
  "Case": {
58
- "p": 0.9696485623,
59
- "r": 0.9589257504,
60
- "f": 0.9642573471
61
  },
62
  "Person": {
63
- "p": 0.9704347826,
64
- "r": 0.9911190053,
65
- "f": 0.9806678383
66
  },
67
  "PronType": {
68
- "p": 0.9811165846,
69
  "r": 0.9827302632,
70
- "f": 0.9819227609
71
  },
72
  "NumType": {
73
- "p": 0.9794520548,
74
- "r": 0.9470198675,
75
- "f": 0.962962963
76
  },
77
  "Degree": {
78
- "p": 0.9520295203,
79
- "r": 0.9325301205,
80
- "f": 0.942178941
81
  },
82
  "Reflex": {
83
  "p": 1.0,
@@ -85,14 +85,14 @@
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
- "p": 0.9662921348,
89
- "r": 1.0,
90
- "f": 0.9828571429
91
  },
92
  "Poss": {
93
- "p": 0.9777777778,
94
- "r": 1.0,
95
- "f": 0.9887640449
96
  },
97
  "Foreign": {
98
  "p": 0.6666666667,
@@ -115,141 +115,146 @@
115
  "f": 0.8571428571
116
  }
117
  },
118
- "sents_p": 0.908438061,
119
- "sents_r": 0.8971631206,
120
- "sents_f": 0.902765388,
121
- "dep_uas": 0.8220632614,
122
- "dep_las": 0.7813355686,
123
  "dep_las_per_type": {
124
  "advmod": {
125
- "p": 0.7073509015,
126
- "r": 0.7203389831,
127
- "f": 0.7137858642
128
  },
129
  "root": {
130
- "p": 0.8411552347,
131
- "r": 0.8262411348,
132
- "f": 0.8336314848
133
  },
134
  "nsubj": {
135
- "p": 0.8428417653,
136
- "r": 0.8259493671,
137
- "f": 0.8343100693
138
  },
139
  "case": {
140
- "p": 0.881372549,
141
- "r": 0.8865877712,
142
- "f": 0.883972468
143
  },
144
  "obl": {
145
- "p": 0.7053291536,
146
- "r": 0.698757764,
147
- "f": 0.7020280811
148
  },
149
  "cc": {
150
- "p": 0.7851002865,
151
- "r": 0.7965116279,
152
- "f": 0.7907647908
153
  },
154
  "conj": {
155
- "p": 0.6491712707,
156
- "r": 0.6266666667,
157
- "f": 0.6377204885
158
  },
159
  "obj": {
160
- "p": 0.8033088235,
161
- "r": 0.8485436893,
162
- "f": 0.8253068933
163
  },
164
  "aux": {
165
- "p": 0.875739645,
166
- "r": 0.8629737609,
167
- "f": 0.8693098385
168
  },
169
  "acl:relcl": {
170
- "p": 0.5879120879,
171
  "r": 0.5783783784,
172
- "f": 0.583106267
173
  },
174
  "advmod:lmod": {
175
- "p": 0.7586206897,
176
- "r": 0.6567164179,
177
- "f": 0.704
178
  },
179
  "det": {
180
- "p": 0.9194078947,
181
- "r": 0.92092257,
182
- "f": 0.9201646091
183
  },
184
  "amod": {
185
- "p": 0.8193979933,
186
- "r": 0.8361774744,
187
- "f": 0.8277027027
188
  },
189
  "nmod:poss": {
190
- "p": 0.7340425532,
191
- "r": 0.6831683168,
192
- "f": 0.7076923077
193
  },
194
  "ccomp": {
195
- "p": 0.5652173913,
196
- "r": 0.6290322581,
197
- "f": 0.5954198473
198
  },
199
  "nummod": {
200
- "p": 0.8306451613,
201
  "r": 0.8583333333,
202
- "f": 0.8442622951
203
  },
204
  "flat": {
205
- "p": 0.7636363636,
206
- "r": 0.8344370861,
207
- "f": 0.7974683544
208
  },
209
  "compound:prt": {
210
- "p": 0.5,
211
- "r": 0.3170731707,
212
- "f": 0.3880597015
213
  },
214
  "advcl": {
215
- "p": 0.6339285714,
216
- "r": 0.6120689655,
217
- "f": 0.6228070175
218
  },
219
  "mark": {
220
- "p": 0.8905579399,
221
- "r": 0.8521560575,
222
- "f": 0.870933893
223
  },
224
  "cop": {
225
- "p": 0.7837837838,
226
- "r": 0.8285714286,
227
- "f": 0.8055555556
228
  },
229
  "dep": {
230
- "p": 0.1555555556,
231
  "r": 0.2641509434,
232
- "f": 0.1958041958
233
  },
234
  "nmod": {
235
- "p": 0.6352941176,
236
- "r": 0.6328125,
237
- "f": 0.6340508806
238
  },
239
  "iobj": {
240
- "p": 0.8125,
241
- "r": 0.5909090909,
242
- "f": 0.6842105263
243
  },
244
  "xcomp": {
245
- "p": 0.5675675676,
246
  "r": 0.3559322034,
247
- "f": 0.4375
 
 
 
 
 
248
  },
249
  "list": {
250
- "p": 0.4,
251
- "r": 0.2222222222,
252
- "f": 0.2857142857
253
  },
254
  "vocative": {
255
  "p": 0.0,
@@ -262,57 +267,52 @@
262
  "f": 0.8461538462
263
  },
264
  "expl": {
265
- "p": 0.8484848485,
266
- "r": 0.8235294118,
267
- "f": 0.8358208955
268
- },
269
- "appos": {
270
- "p": 0.4146341463,
271
- "r": 0.5151515152,
272
- "f": 0.4594594595
273
  },
274
  "obl:tmod": {
275
- "p": 0.6,
276
- "r": 0.3333333333,
277
- "f": 0.4285714286
278
  },
279
- "discourse": {
280
  "p": 0.0,
281
  "r": 0.0,
282
  "f": 0.0
283
  },
284
- "obl:lmod": {
285
  "p": 0.0,
286
  "r": 0.0,
287
  "f": 0.0
288
  }
289
  },
290
- "tag_acc": 0.962905569,
291
- "lemma_acc": 0.8491041162,
292
- "ents_p": 0.8075313808,
293
- "ents_r": 0.8041666667,
294
- "ents_f": 0.8058455115,
295
  "ents_per_type": {
296
  "PER": {
297
- "p": 0.9215686275,
298
  "r": 0.8493975904,
299
- "f": 0.8840125392
300
  },
301
  "ORG": {
302
- "p": 0.7032967033,
303
- "r": 0.7111111111,
304
- "f": 0.7071823204
305
  },
306
  "MISC": {
307
- "p": 0.7179487179,
308
- "r": 0.7433628319,
309
- "f": 0.7304347826
310
  },
311
  "LOC": {
312
- "p": 0.8290598291,
313
- "r": 0.8738738739,
314
- "f": 0.850877193
315
  }
316
  },
317
- "speed": 10055.5372609128
318
  }
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
+ "pos_acc": 0.9641646489,
7
+ "morph_acc": 0.9505084746,
8
+ "morph_micro_p": 0.9668578389,
9
+ "morph_micro_r": 0.9615869971,
10
+ "morph_micro_f": 0.9642152149,
11
  "morph_per_feat": {
12
  "Mood": {
13
+ "p": 0.9723809524,
14
+ "r": 0.9733079123,
15
+ "f": 0.9728442115
16
  },
17
  "Tense": {
18
+ "p": 0.9669172932,
19
  "r": 0.968373494,
20
+ "f": 0.9676448457
21
  },
22
  "VerbForm": {
23
+ "p": 0.9583588487,
24
+ "r": 0.9577723378,
25
+ "f": 0.9580655035
26
  },
27
  "Voice": {
28
+ "p": 0.9739195231,
29
  "r": 0.9768310912,
30
+ "f": 0.9753731343
31
  },
32
  "Definite": {
33
+ "p": 0.964940239,
34
+ "r": 0.9569340182,
35
+ "f": 0.9609204523
36
  },
37
  "Gender": {
38
+ "p": 0.9487608841,
39
  "r": 0.9415088069,
40
+ "f": 0.9451209341
41
  },
42
  "Number": {
43
+ "p": 0.9608615708,
44
+ "r": 0.95409494,
45
+ "f": 0.9574663002
46
  },
47
  "AdpType": {
48
+ "p": 1.0,
49
  "r": 0.9920424403,
50
+ "f": 0.9960053262
51
  },
52
  "PartType": {
53
  "p": 1.0,
55
  "f": 1.0
56
  },
57
  "Case": {
58
+ "p": 0.9775641026,
59
+ "r": 0.9636650869,
60
+ "f": 0.9705648369
61
  },
62
  "Person": {
63
+ "p": 0.9788359788,
64
+ "r": 0.9857904085,
65
+ "f": 0.982300885
66
  },
67
  "PronType": {
68
+ "p": 0.9851607585,
69
  "r": 0.9827302632,
70
+ "f": 0.9839440099
71
  },
72
  "NumType": {
73
+ "p": 0.9863013699,
74
+ "r": 0.9536423841,
75
+ "f": 0.9696969697
76
  },
77
  "Degree": {
78
+ "p": 0.9548229548,
79
+ "r": 0.9421686747,
80
+ "f": 0.9484536082
81
  },
82
  "Reflex": {
83
  "p": 1.0,
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
+ "p": 0.988372093,
89
+ "r": 0.988372093,
90
+ "f": 0.988372093
91
  },
92
  "Poss": {
93
+ "p": 0.9886363636,
94
+ "r": 0.9886363636,
95
+ "f": 0.9886363636
96
  },
97
  "Foreign": {
98
  "p": 0.6666666667,
115
  "f": 0.8571428571
116
  }
117
  },
118
+ "sents_p": 0.9219600726,
119
+ "sents_r": 0.9007092199,
120
+ "sents_f": 0.9112107623,
121
+ "dep_uas": 0.8130063132,
122
+ "dep_las": 0.7723336499,
123
  "dep_las_per_type": {
124
  "advmod": {
125
+ "p": 0.6868965517,
126
+ "r": 0.7033898305,
127
+ "f": 0.6950453594
128
  },
129
  "root": {
130
+ "p": 0.8381818182,
131
+ "r": 0.8173758865,
132
+ "f": 0.8276481149
133
  },
134
  "nsubj": {
135
+ "p": 0.8317460317,
136
+ "r": 0.8291139241,
137
+ "f": 0.8304278922
138
  },
139
  "case": {
140
+ "p": 0.883218842,
141
+ "r": 0.8875739645,
142
+ "f": 0.8853910477
143
  },
144
  "obl": {
145
+ "p": 0.6935483871,
146
+ "r": 0.6677018634,
147
+ "f": 0.6803797468
148
  },
149
  "cc": {
150
+ "p": 0.7797101449,
151
+ "r": 0.7819767442,
152
+ "f": 0.7808417997
153
  },
154
  "conj": {
155
+ "p": 0.6305555556,
156
+ "r": 0.6053333333,
157
+ "f": 0.6176870748
158
  },
159
  "obj": {
160
+ "p": 0.7667269439,
161
+ "r": 0.8233009709,
162
+ "f": 0.7940074906
163
  },
164
  "aux": {
165
+ "p": 0.8797653959,
166
+ "r": 0.8746355685,
167
+ "f": 0.8771929825
168
  },
169
  "acl:relcl": {
170
+ "p": 0.6011235955,
171
  "r": 0.5783783784,
172
+ "f": 0.5895316804
173
  },
174
  "advmod:lmod": {
175
+ "p": 0.6388888889,
176
+ "r": 0.6865671642,
177
+ "f": 0.6618705036
178
  },
179
  "det": {
180
+ "p": 0.9143327842,
181
+ "r": 0.9143327842,
182
+ "f": 0.9143327842
183
  },
184
  "amod": {
185
+ "p": 0.7954173486,
186
+ "r": 0.8293515358,
187
+ "f": 0.8120300752
188
  },
189
  "nmod:poss": {
190
+ "p": 0.6593406593,
191
+ "r": 0.5940594059,
192
+ "f": 0.625
193
  },
194
  "ccomp": {
195
+ "p": 0.5890410959,
196
+ "r": 0.6935483871,
197
+ "f": 0.637037037
198
  },
199
  "nummod": {
200
+ "p": 0.8512396694,
201
  "r": 0.8583333333,
202
+ "f": 0.8547717842
203
  },
204
  "flat": {
205
+ "p": 0.7514450867,
206
+ "r": 0.8609271523,
207
+ "f": 0.8024691358
208
  },
209
  "compound:prt": {
210
+ "p": 0.4285714286,
211
+ "r": 0.2195121951,
212
+ "f": 0.2903225806
213
  },
214
  "advcl": {
215
+ "p": 0.5909090909,
216
+ "r": 0.5603448276,
217
+ "f": 0.5752212389
218
  },
219
  "mark": {
220
+ "p": 0.8742138365,
221
+ "r": 0.8562628337,
222
+ "f": 0.8651452282
223
  },
224
  "cop": {
225
+ "p": 0.7891891892,
226
+ "r": 0.8342857143,
227
+ "f": 0.8111111111
228
  },
229
  "dep": {
230
+ "p": 0.1538461538,
231
  "r": 0.2641509434,
232
+ "f": 0.1944444444
233
  },
234
  "nmod": {
235
+ "p": 0.6551724138,
236
+ "r": 0.630859375,
237
+ "f": 0.6427860697
238
  },
239
  "iobj": {
240
+ "p": 0.7272727273,
241
+ "r": 0.3636363636,
242
+ "f": 0.4848484848
243
  },
244
  "xcomp": {
245
+ "p": 0.5833333333,
246
  "r": 0.3559322034,
247
+ "f": 0.4421052632
248
+ },
249
+ "appos": {
250
+ "p": 0.4375,
251
+ "r": 0.4242424242,
252
+ "f": 0.4307692308
253
  },
254
  "list": {
255
+ "p": 0.375,
256
+ "r": 0.1666666667,
257
+ "f": 0.2307692308
258
  },
259
  "vocative": {
260
  "p": 0.0,
267
  "f": 0.8461538462
268
  },
269
  "expl": {
270
+ "p": 0.8181818182,
271
+ "r": 0.7941176471,
272
+ "f": 0.8059701493
 
 
 
 
 
273
  },
274
  "obl:tmod": {
275
+ "p": 0.7777777778,
276
+ "r": 0.3888888889,
277
+ "f": 0.5185185185
278
  },
279
+ "obl:lmod": {
280
  "p": 0.0,
281
  "r": 0.0,
282
  "f": 0.0
283
  },
284
+ "discourse": {
285
  "p": 0.0,
286
  "r": 0.0,
287
  "f": 0.0
288
  }
289
  },
290
+ "lemma_acc": 0.9481840194,
291
+ "tag_acc": 0.9641646489,
292
+ "ents_p": 0.8079331942,
293
+ "ents_r": 0.80625,
294
+ "ents_f": 0.8070907195,
295
  "ents_per_type": {
296
  "PER": {
297
+ "p": 0.88125,
298
  "r": 0.8493975904,
299
+ "f": 0.8650306748
300
  },
301
  "ORG": {
302
+ "p": 0.7294117647,
303
+ "r": 0.6888888889,
304
+ "f": 0.7085714286
305
  },
306
  "MISC": {
307
+ "p": 0.6991869919,
308
+ "r": 0.7610619469,
309
+ "f": 0.7288135593
310
  },
311
  "LOC": {
312
+ "p": 0.8828828829,
313
+ "r": 0.8828828829,
314
+ "f": 0.8828828829
315
  }
316
  },
317
+ "speed": 11453.5916881723
318
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
config.cfg CHANGED
@@ -10,7 +10,7 @@ seed = 0
10
 
11
  [nlp]
12
  lang = "da"
13
- pipeline = ["tok2vec","morphologizer","parser","senter","attribute_ruler","lemmatizer","ner"]
14
  disabled = ["senter"]
15
  before_creation = null
16
  after_creation = null
@@ -26,11 +26,22 @@ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
26
  validate = false
27
 
28
  [components.lemmatizer]
29
- factory = "lemmatizer"
30
- mode = "lookup"
31
- model = null
32
  overwrite = false
33
  scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
@@ -39,8 +50,9 @@ overwrite = true
39
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
40
 
41
  [components.morphologizer.model]
42
- @architectures = "spacy.Tagger.v1"
43
  nO = null
 
44
 
45
  [components.morphologizer.model.tok2vec]
46
  @architectures = "spacy.Tok2VecListener.v1"
@@ -70,7 +82,7 @@ nO = null
70
  @architectures = "spacy.MultiHashEmbed.v2"
71
  width = 96
72
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
73
- rows = [5000,2500,2500,2500,100]
74
  include_static_vectors = true
75
 
76
  [components.ner.model.tok2vec.encode]
@@ -108,8 +120,9 @@ overwrite = false
108
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
109
 
110
  [components.senter.model]
111
- @architectures = "spacy.Tagger.v1"
112
  nO = null
 
113
 
114
  [components.senter.model.tok2vec]
115
  @architectures = "spacy.Tok2Vec.v2"
@@ -138,7 +151,7 @@ factory = "tok2vec"
138
  @architectures = "spacy.MultiHashEmbed.v2"
139
  width = ${components.tok2vec.model.encode:width}
140
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
141
- rows = [5000,2500,2500,2500,100]
142
  include_static_vectors = true
143
 
144
  [components.tok2vec.model.encode]
@@ -175,7 +188,7 @@ dropout = 0.1
175
  accumulate_gradient = 1
176
  patience = 5000
177
  max_epochs = 0
178
- max_steps = 0
179
  eval_frequency = 1000
180
  frozen_components = []
181
  before_to_disk = null
@@ -210,17 +223,17 @@ eps = 0.00000001
210
  learn_rate = 0.001
211
 
212
  [training.score_weights]
213
- pos_acc = 0.08
214
- morph_acc = 0.08
215
  morph_per_feat = null
216
  dep_uas = 0.0
217
- dep_las = 0.16
218
  dep_las_per_type = null
219
  sents_p = null
220
  sents_r = null
221
- sents_f = 0.02
222
- lemma_acc = 0.5
223
- ents_f = 0.16
224
  ents_p = 0.0
225
  ents_r = 0.0
226
  ents_per_type = null
@@ -237,6 +250,13 @@ after_init = null
237
 
238
  [initialize.components]
239
 
 
 
 
 
 
 
 
240
  [initialize.components.morphologizer]
241
 
242
  [initialize.components.morphologizer.labels]
10
 
11
  [nlp]
12
  lang = "da"
13
+ pipeline = ["tok2vec","morphologizer","parser","lemmatizer","senter","attribute_ruler","ner"]
14
  disabled = ["senter"]
15
  before_creation = null
16
  after_creation = null
26
  validate = false
27
 
28
  [components.lemmatizer]
29
+ factory = "trainable_lemmatizer"
30
+ backoff = "orth"
31
+ min_tree_freq = 3
32
  overwrite = false
33
  scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
34
+ top_k = 1
35
+
36
+ [components.lemmatizer.model]
37
+ @architectures = "spacy.Tagger.v2"
38
+ nO = null
39
+ normalize = false
40
+
41
+ [components.lemmatizer.model.tok2vec]
42
+ @architectures = "spacy.Tok2VecListener.v1"
43
+ width = ${components.tok2vec.model.encode:width}
44
+ upstream = "tok2vec"
45
 
46
  [components.morphologizer]
47
  factory = "morphologizer"
50
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
51
 
52
  [components.morphologizer.model]
53
+ @architectures = "spacy.Tagger.v2"
54
  nO = null
55
+ normalize = false
56
 
57
  [components.morphologizer.model.tok2vec]
58
  @architectures = "spacy.Tok2VecListener.v1"
82
  @architectures = "spacy.MultiHashEmbed.v2"
83
  width = 96
84
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
85
+ rows = [5000,1000,2500,2500,50]
86
  include_static_vectors = true
87
 
88
  [components.ner.model.tok2vec.encode]
120
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
121
 
122
  [components.senter.model]
123
+ @architectures = "spacy.Tagger.v2"
124
  nO = null
125
+ normalize = false
126
 
127
  [components.senter.model.tok2vec]
128
  @architectures = "spacy.Tok2Vec.v2"
151
  @architectures = "spacy.MultiHashEmbed.v2"
152
  width = ${components.tok2vec.model.encode:width}
153
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
154
+ rows = [5000,1000,2500,2500,50]
155
  include_static_vectors = true
156
 
157
  [components.tok2vec.model.encode]
188
  accumulate_gradient = 1
189
  patience = 5000
190
  max_epochs = 0
191
+ max_steps = 100000
192
  eval_frequency = 1000
193
  frozen_components = []
194
  before_to_disk = null
223
  learn_rate = 0.001
224
 
225
  [training.score_weights]
226
+ pos_acc = 0.14
227
+ morph_acc = 0.14
228
  morph_per_feat = null
229
  dep_uas = 0.0
230
+ dep_las = 0.29
231
  dep_las_per_type = null
232
  sents_p = null
233
  sents_r = null
234
+ sents_f = 0.04
235
+ lemma_acc = 0.1
236
+ ents_f = 0.29
237
  ents_p = 0.0
238
  ents_r = 0.0
239
  ents_per_type = null
250
 
251
  [initialize.components]
252
 
253
+ [initialize.components.lemmatizer]
254
+
255
+ [initialize.components.lemmatizer.labels]
256
+ @readers = "spacy.read_labels.v1"
257
+ path = "corpus/labels/trainable_lemmatizer.json"
258
+ require = false
259
+
260
  [initialize.components.morphologizer]
261
 
262
  [initialize.components.morphologizer.labels]
da_core_news_md-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0195b5af78d68d4f3a6895cfa3cde98871682bbddc4b04c375770980fac4d222
3
- size 48943545
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25dcaba4499ac19b4f1d23884cc5896401eb1e2c896c6a49c10a6f8fe6190398
3
+ size 42193668
lemmatizer/cfg ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ 1,
4
+ 2,
5
+ 4,
6
+ 6,
7
+ 8,
8
+ 10,
9
+ 12,
10
+ 14,
11
+ 16,
12
+ 18,
13
+ 20,
14
+ 24,
15
+ 28,
16
+ 30,
17
+ 32,
18
+ 34,
19
+ 36,
20
+ 39,
21
+ 41,
22
+ 42,
23
+ 43,
24
+ 45,
25
+ 47,
26
+ 49,
27
+ 51,
28
+ 53,
29
+ 55,
30
+ 57,
31
+ 61,
32
+ 65,
33
+ 67,
34
+ 71,
35
+ 73,
36
+ 75,
37
+ 77,
38
+ 79,
39
+ 81,
40
+ 83,
41
+ 85,
42
+ 87,
43
+ 89,
44
+ 91,
45
+ 93,
46
+ 95,
47
+ 99,
48
+ 101,
49
+ 102,
50
+ 104,
51
+ 107,
52
+ 111,
53
+ 113,
54
+ 116,
55
+ 118,
56
+ 121,
57
+ 124,
58
+ 127,
59
+ 128,
60
+ 131,
61
+ 133,
62
+ 134,
63
+ 136,
64
+ 138,
65
+ 140,
66
+ 142,
67
+ 144,
68
+ 145,
69
+ 147,
70
+ 148,
71
+ 149,
72
+ 153,
73
+ 155,
74
+ 158,
75
+ 161,
76
+ 164,
77
+ 166,
78
+ 168,
79
+ 170,
80
+ 172,
81
+ 174,
82
+ 175,
83
+ 177,
84
+ 179,
85
+ 182,
86
+ 184,
87
+ 186,
88
+ 188,
89
+ 190,
90
+ 192,
91
+ 194,
92
+ 196,
93
+ 199,
94
+ 201,
95
+ 203,
96
+ 204,
97
+ 207,
98
+ 208,
99
+ 209,
100
+ 211,
101
+ 213,
102
+ 214,
103
+ 216,
104
+ 218,
105
+ 220,
106
+ 222,
107
+ 224,
108
+ 226,
109
+ 229,
110
+ 231,
111
+ 232,
112
+ 233,
113
+ 235,
114
+ 236,
115
+ 238,
116
+ 239,
117
+ 243,
118
+ 249,
119
+ 253,
120
+ 255,
121
+ 257,
122
+ 259,
123
+ 261,
124
+ 262,
125
+ 263,
126
+ 264,
127
+ 267,
128
+ 269,
129
+ 270,
130
+ 272,
131
+ 274,
132
+ 276,
133
+ 278,
134
+ 280,
135
+ 282,
136
+ 284,
137
+ 286,
138
+ 290,
139
+ 291,
140
+ 293,
141
+ 295,
142
+ 297,
143
+ 299,
144
+ 300,
145
+ 302,
146
+ 303,
147
+ 304,
148
+ 306,
149
+ 308,
150
+ 311,
151
+ 314,
152
+ 315,
153
+ 317,
154
+ 320,
155
+ 321,
156
+ 323,
157
+ 324,
158
+ 326,
159
+ 327,
160
+ 328,
161
+ 330,
162
+ 331,
163
+ 333,
164
+ 337,
165
+ 339,
166
+ 340,
167
+ 344,
168
+ 346,
169
+ 350,
170
+ 353,
171
+ 354,
172
+ 355,
173
+ 358,
174
+ 360,
175
+ 361,
176
+ 363,
177
+ 365,
178
+ 366,
179
+ 369,
180
+ 372,
181
+ 373,
182
+ 376,
183
+ 380,
184
+ 382,
185
+ 383,
186
+ 384,
187
+ 386,
188
+ 387,
189
+ 389,
190
+ 391,
191
+ 392,
192
+ 394,
193
+ 395,
194
+ 398,
195
+ 400,
196
+ 402,
197
+ 404,
198
+ 406,
199
+ 409,
200
+ 411,
201
+ 412,
202
+ 413,
203
+ 415,
204
+ 417,
205
+ 420,
206
+ 421,
207
+ 423,
208
+ 424,
209
+ 425,
210
+ 427,
211
+ 429,
212
+ 431,
213
+ 433,
214
+ 434,
215
+ 436,
216
+ 437,
217
+ 439,
218
+ 440,
219
+ 442,
220
+ 444,
221
+ 445,
222
+ 449,
223
+ 450,
224
+ 452,
225
+ 454,
226
+ 457,
227
+ 459,
228
+ 462,
229
+ 465,
230
+ 466,
231
+ 468,
232
+ 470,
233
+ 471,
234
+ 474,
235
+ 475,
236
+ 478,
237
+ 480,
238
+ 483,
239
+ 485,
240
+ 486,
241
+ 487,
242
+ 489,
243
+ 491,
244
+ 492,
245
+ 493,
246
+ 495,
247
+ 496,
248
+ 498,
249
+ 500,
250
+ 501,
251
+ 502,
252
+ 503,
253
+ 504,
254
+ 505,
255
+ 507,
256
+ 508,
257
+ 509,
258
+ 510,
259
+ 511,
260
+ 512,
261
+ 514,
262
+ 515,
263
+ 516,
264
+ 518,
265
+ 519,
266
+ 520,
267
+ 521,
268
+ 523,
269
+ 525,
270
+ 526,
271
+ 528,
272
+ 531,
273
+ 533,
274
+ 535,
275
+ 453,
276
+ 536,
277
+ 538,
278
+ 539,
279
+ 541,
280
+ 545,
281
+ 547,
282
+ 548,
283
+ 549,
284
+ 550,
285
+ 551,
286
+ 553,
287
+ 554,
288
+ 555,
289
+ 557,
290
+ 559,
291
+ 560,
292
+ 561,
293
+ 563,
294
+ 565,
295
+ 566,
296
+ 567,
297
+ 568,
298
+ 570,
299
+ 571,
300
+ 575,
301
+ 577,
302
+ 578,
303
+ 579,
304
+ 582,
305
+ 585,
306
+ 587,
307
+ 589,
308
+ 593,
309
+ 594,
310
+ 596,
311
+ 597,
312
+ 601,
313
+ 603,
314
+ 605,
315
+ 609,
316
+ 611,
317
+ 612,
318
+ 613,
319
+ 614,
320
+ 615,
321
+ 616,
322
+ 617,
323
+ 619,
324
+ 621,
325
+ 622,
326
+ 624,
327
+ 625,
328
+ 627,
329
+ 628,
330
+ 629,
331
+ 632,
332
+ 634,
333
+ 638,
334
+ 639,
335
+ 640,
336
+ 642,
337
+ 644,
338
+ 647,
339
+ 649,
340
+ 650,
341
+ 651,
342
+ 653,
343
+ 654,
344
+ 655,
345
+ 657,
346
+ 658,
347
+ 659,
348
+ 661,
349
+ 663,
350
+ 665,
351
+ 667,
352
+ 669,
353
+ 670,
354
+ 672,
355
+ 674,
356
+ 676,
357
+ 677,
358
+ 678,
359
+ 680,
360
+ 682,
361
+ 683,
362
+ 685,
363
+ 686,
364
+ 688,
365
+ 689,
366
+ 690,
367
+ 691,
368
+ 694,
369
+ 695,
370
+ 696,
371
+ 697,
372
+ 699,
373
+ 700,
374
+ 701,
375
+ 703,
376
+ 705,
377
+ 706,
378
+ 707,
379
+ 708,
380
+ 712,
381
+ 715,
382
+ 716,
383
+ 718,
384
+ 720,
385
+ 724,
386
+ 726,
387
+ 729,
388
+ 730,
389
+ 732,
390
+ 733,
391
+ 734,
392
+ 736,
393
+ 738,
394
+ 739,
395
+ 740,
396
+ 741,
397
+ 742,
398
+ 743,
399
+ 744,
400
+ 747,
401
+ 749,
402
+ 753,
403
+ 756,
404
+ 758,
405
+ 759,
406
+ 761,
407
+ 762,
408
+ 763,
409
+ 764,
410
+ 766,
411
+ 768,
412
+ 769,
413
+ 771,
414
+ 773,
415
+ 774,
416
+ 775,
417
+ 776,
418
+ 777,
419
+ 781,
420
+ 783,
421
+ 784,
422
+ 785,
423
+ 788,
424
+ 791,
425
+ 792,
426
+ 794,
427
+ 796,
428
+ 797,
429
+ 798,
430
+ 799,
431
+ 800,
432
+ 802,
433
+ 803,
434
+ 804,
435
+ 805,
436
+ 806,
437
+ 808,
438
+ 809,
439
+ 810,
440
+ 811,
441
+ 812,
442
+ 814,
443
+ 815,
444
+ 817,
445
+ 819,
446
+ 820,
447
+ 822,
448
+ 824,
449
+ 825,
450
+ 827,
451
+ 829,
452
+ 831,
453
+ 833,
454
+ 835,
455
+ 837
456
+ ]
457
+ }
lemmatizer/{lookups/lookups.bin → model} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6864ce8705293ba1b6dcf349ec133cdc33db3ba57f6e9337458cfe5073b6f103
3
- size 11537995
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5dbaeda1129c701ac9da038242ad1d176d2d9b00e1b9fdaaefa9fd25435920f3
3
+ size 176206
lemmatizer/trees ADDED
Binary file (89.9 kB). View file
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"core_news_md",
4
- "version":"3.2.0",
5
- "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.2.0,<3.3.0",
11
- "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
@@ -212,15 +212,8 @@
212
  "punct",
213
  "xcomp"
214
  ],
215
- "senter":[
216
- "I",
217
- "S"
218
- ],
219
  "attribute_ruler":[
220
 
221
- ],
222
- "lemmatizer":[
223
-
224
  ],
225
  "ner":[
226
  "LOC",
@@ -233,17 +226,17 @@
233
  "tok2vec",
234
  "morphologizer",
235
  "parser",
236
- "attribute_ruler",
237
  "lemmatizer",
 
238
  "ner"
239
  ],
240
  "components":[
241
  "tok2vec",
242
  "morphologizer",
243
  "parser",
 
244
  "senter",
245
  "attribute_ruler",
246
- "lemmatizer",
247
  "ner"
248
  ],
249
  "disabled":[
@@ -254,51 +247,51 @@
254
  "token_p":0.9977732598,
255
  "token_r":0.9974835463,
256
  "token_f":0.997628382,
257
- "pos_acc":0.962905569,
258
- "morph_acc":0.9487651332,
259
- "morph_micro_p":0.9664501066,
260
- "morph_micro_r":0.9611327041,
261
- "morph_micro_f":0.9637840711,
262
  "morph_per_feat":{
263
  "Mood":{
264
- "p":0.9789674952,
265
- "r":0.9761677788,
266
- "f":0.9775656325
267
  },
268
  "Tense":{
269
- "p":0.9698340875,
270
  "r":0.968373494,
271
- "f":0.9691032404
272
  },
273
  "VerbForm":{
274
- "p":0.9631449631,
275
- "r":0.9596083231,
276
- "f":0.9613733906
277
  },
278
  "Voice":{
279
- "p":0.9782934132,
280
  "r":0.9768310912,
281
- "f":0.9775617053
282
  },
283
  "Definite":{
284
- "p":0.9633027523,
285
- "r":0.9541683129,
286
- "f":0.9587137753
287
  },
288
  "Gender":{
289
- "p":0.9459098497,
290
  "r":0.9415088069,
291
- "f":0.9437041972
292
  },
293
  "Number":{
294
- "p":0.9618621778,
295
- "r":0.9538341158,
296
- "f":0.9578313253
297
  },
298
  "AdpType":{
299
- "p":0.9973333333,
300
  "r":0.9920424403,
301
- "f":0.9946808511
302
  },
303
  "PartType":{
304
  "p":1.0,
@@ -306,29 +299,29 @@
306
  "f":1.0
307
  },
308
  "Case":{
309
- "p":0.9696485623,
310
- "r":0.9589257504,
311
- "f":0.9642573471
312
  },
313
  "Person":{
314
- "p":0.9704347826,
315
- "r":0.9911190053,
316
- "f":0.9806678383
317
  },
318
  "PronType":{
319
- "p":0.9811165846,
320
  "r":0.9827302632,
321
- "f":0.9819227609
322
  },
323
  "NumType":{
324
- "p":0.9794520548,
325
- "r":0.9470198675,
326
- "f":0.962962963
327
  },
328
  "Degree":{
329
- "p":0.9520295203,
330
- "r":0.9325301205,
331
- "f":0.942178941
332
  },
333
  "Reflex":{
334
  "p":1.0,
@@ -336,14 +329,14 @@
336
  "f":1.0
337
  },
338
  "Number[psor]":{
339
- "p":0.9662921348,
340
- "r":1.0,
341
- "f":0.9828571429
342
  },
343
  "Poss":{
344
- "p":0.9777777778,
345
- "r":1.0,
346
- "f":0.9887640449
347
  },
348
  "Foreign":{
349
  "p":0.6666666667,
@@ -366,141 +359,146 @@
366
  "f":0.8571428571
367
  }
368
  },
369
- "sents_p":0.908438061,
370
- "sents_r":0.8971631206,
371
- "sents_f":0.902765388,
372
- "dep_uas":0.8220632614,
373
- "dep_las":0.7813355686,
374
  "dep_las_per_type":{
375
  "advmod":{
376
- "p":0.7073509015,
377
- "r":0.7203389831,
378
- "f":0.7137858642
379
  },
380
  "root":{
381
- "p":0.8411552347,
382
- "r":0.8262411348,
383
- "f":0.8336314848
384
  },
385
  "nsubj":{
386
- "p":0.8428417653,
387
- "r":0.8259493671,
388
- "f":0.8343100693
389
  },
390
  "case":{
391
- "p":0.881372549,
392
- "r":0.8865877712,
393
- "f":0.883972468
394
  },
395
  "obl":{
396
- "p":0.7053291536,
397
- "r":0.698757764,
398
- "f":0.7020280811
399
  },
400
  "cc":{
401
- "p":0.7851002865,
402
- "r":0.7965116279,
403
- "f":0.7907647908
404
  },
405
  "conj":{
406
- "p":0.6491712707,
407
- "r":0.6266666667,
408
- "f":0.6377204885
409
  },
410
  "obj":{
411
- "p":0.8033088235,
412
- "r":0.8485436893,
413
- "f":0.8253068933
414
  },
415
  "aux":{
416
- "p":0.875739645,
417
- "r":0.8629737609,
418
- "f":0.8693098385
419
  },
420
  "acl:relcl":{
421
- "p":0.5879120879,
422
  "r":0.5783783784,
423
- "f":0.583106267
424
  },
425
  "advmod:lmod":{
426
- "p":0.7586206897,
427
- "r":0.6567164179,
428
- "f":0.704
429
  },
430
  "det":{
431
- "p":0.9194078947,
432
- "r":0.92092257,
433
- "f":0.9201646091
434
  },
435
  "amod":{
436
- "p":0.8193979933,
437
- "r":0.8361774744,
438
- "f":0.8277027027
439
  },
440
  "nmod:poss":{
441
- "p":0.7340425532,
442
- "r":0.6831683168,
443
- "f":0.7076923077
444
  },
445
  "ccomp":{
446
- "p":0.5652173913,
447
- "r":0.6290322581,
448
- "f":0.5954198473
449
  },
450
  "nummod":{
451
- "p":0.8306451613,
452
  "r":0.8583333333,
453
- "f":0.8442622951
454
  },
455
  "flat":{
456
- "p":0.7636363636,
457
- "r":0.8344370861,
458
- "f":0.7974683544
459
  },
460
  "compound:prt":{
461
- "p":0.5,
462
- "r":0.3170731707,
463
- "f":0.3880597015
464
  },
465
  "advcl":{
466
- "p":0.6339285714,
467
- "r":0.6120689655,
468
- "f":0.6228070175
469
  },
470
  "mark":{
471
- "p":0.8905579399,
472
- "r":0.8521560575,
473
- "f":0.870933893
474
  },
475
  "cop":{
476
- "p":0.7837837838,
477
- "r":0.8285714286,
478
- "f":0.8055555556
479
  },
480
  "dep":{
481
- "p":0.1555555556,
482
  "r":0.2641509434,
483
- "f":0.1958041958
484
  },
485
  "nmod":{
486
- "p":0.6352941176,
487
- "r":0.6328125,
488
- "f":0.6340508806
489
  },
490
  "iobj":{
491
- "p":0.8125,
492
- "r":0.5909090909,
493
- "f":0.6842105263
494
  },
495
  "xcomp":{
496
- "p":0.5675675676,
497
  "r":0.3559322034,
498
- "f":0.4375
 
 
 
 
 
499
  },
500
  "list":{
501
- "p":0.4,
502
- "r":0.2222222222,
503
- "f":0.2857142857
504
  },
505
  "vocative":{
506
  "p":0.0,
@@ -513,59 +511,54 @@
513
  "f":0.8461538462
514
  },
515
  "expl":{
516
- "p":0.8484848485,
517
- "r":0.8235294118,
518
- "f":0.8358208955
519
- },
520
- "appos":{
521
- "p":0.4146341463,
522
- "r":0.5151515152,
523
- "f":0.4594594595
524
  },
525
  "obl:tmod":{
526
- "p":0.6,
527
- "r":0.3333333333,
528
- "f":0.4285714286
529
  },
530
- "discourse":{
531
  "p":0.0,
532
  "r":0.0,
533
  "f":0.0
534
  },
535
- "obl:lmod":{
536
  "p":0.0,
537
  "r":0.0,
538
  "f":0.0
539
  }
540
  },
541
- "tag_acc":0.962905569,
542
- "lemma_acc":0.8491041162,
543
- "ents_p":0.8075313808,
544
- "ents_r":0.8041666667,
545
- "ents_f":0.8058455115,
546
  "ents_per_type":{
547
  "PER":{
548
- "p":0.9215686275,
549
  "r":0.8493975904,
550
- "f":0.8840125392
551
  },
552
  "ORG":{
553
- "p":0.7032967033,
554
- "r":0.7111111111,
555
- "f":0.7071823204
556
  },
557
  "MISC":{
558
- "p":0.7179487179,
559
- "r":0.7433628319,
560
- "f":0.7304347826
561
  },
562
  "LOC":{
563
- "p":0.8290598291,
564
- "r":0.8738738739,
565
- "f":0.850877193
566
  }
567
  },
568
- "speed":10055.5372609128
569
  },
570
  "sources":[
571
  {
@@ -580,12 +573,6 @@
580
  "license":"CC BY-SA 4.0",
581
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
582
  },
583
- {
584
- "name":"Lemmatization Lists",
585
- "url":"https://github.com/michmech/lemmatization-lists/",
586
- "license":"ODbL",
587
- "author":"Michal M\u011bchura"
588
- },
589
  {
590
  "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
591
  "url":"https://spacy.io",
1
  {
2
  "lang":"da",
3
  "name":"core_news_md",
4
+ "version":"3.3.0",
5
+ "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.3.0.dev0,<3.4.0",
11
+ "spacy_git_version":"849bef2de",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
212
  "punct",
213
  "xcomp"
214
  ],
 
 
 
 
215
  "attribute_ruler":[
216
 
 
 
 
217
  ],
218
  "ner":[
219
  "LOC",
226
  "tok2vec",
227
  "morphologizer",
228
  "parser",
 
229
  "lemmatizer",
230
+ "attribute_ruler",
231
  "ner"
232
  ],
233
  "components":[
234
  "tok2vec",
235
  "morphologizer",
236
  "parser",
237
+ "lemmatizer",
238
  "senter",
239
  "attribute_ruler",
 
240
  "ner"
241
  ],
242
  "disabled":[
247
  "token_p":0.9977732598,
248
  "token_r":0.9974835463,
249
  "token_f":0.997628382,
250
+ "pos_acc":0.9641646489,
251
+ "morph_acc":0.9505084746,
252
+ "morph_micro_p":0.9668578389,
253
+ "morph_micro_r":0.9615869971,
254
+ "morph_micro_f":0.9642152149,
255
  "morph_per_feat":{
256
  "Mood":{
257
+ "p":0.9723809524,
258
+ "r":0.9733079123,
259
+ "f":0.9728442115
260
  },
261
  "Tense":{
262
+ "p":0.9669172932,
263
  "r":0.968373494,
264
+ "f":0.9676448457
265
  },
266
  "VerbForm":{
267
+ "p":0.9583588487,
268
+ "r":0.9577723378,
269
+ "f":0.9580655035
270
  },
271
  "Voice":{
272
+ "p":0.9739195231,
273
  "r":0.9768310912,
274
+ "f":0.9753731343
275
  },
276
  "Definite":{
277
+ "p":0.964940239,
278
+ "r":0.9569340182,
279
+ "f":0.9609204523
280
  },
281
  "Gender":{
282
+ "p":0.9487608841,
283
  "r":0.9415088069,
284
+ "f":0.9451209341
285
  },
286
  "Number":{
287
+ "p":0.9608615708,
288
+ "r":0.95409494,
289
+ "f":0.9574663002
290
  },
291
  "AdpType":{
292
+ "p":1.0,
293
  "r":0.9920424403,
294
+ "f":0.9960053262
295
  },
296
  "PartType":{
297
  "p":1.0,
299
  "f":1.0
300
  },
301
  "Case":{
302
+ "p":0.9775641026,
303
+ "r":0.9636650869,
304
+ "f":0.9705648369
305
  },
306
  "Person":{
307
+ "p":0.9788359788,
308
+ "r":0.9857904085,
309
+ "f":0.982300885
310
  },
311
  "PronType":{
312
+ "p":0.9851607585,
313
  "r":0.9827302632,
314
+ "f":0.9839440099
315
  },
316
  "NumType":{
317
+ "p":0.9863013699,
318
+ "r":0.9536423841,
319
+ "f":0.9696969697
320
  },
321
  "Degree":{
322
+ "p":0.9548229548,
323
+ "r":0.9421686747,
324
+ "f":0.9484536082
325
  },
326
  "Reflex":{
327
  "p":1.0,
329
  "f":1.0
330
  },
331
  "Number[psor]":{
332
+ "p":0.988372093,
333
+ "r":0.988372093,
334
+ "f":0.988372093
335
  },
336
  "Poss":{
337
+ "p":0.9886363636,
338
+ "r":0.9886363636,
339
+ "f":0.9886363636
340
  },
341
  "Foreign":{
342
  "p":0.6666666667,
359
  "f":0.8571428571
360
  }
361
  },
362
+ "sents_p":0.9219600726,
363
+ "sents_r":0.9007092199,
364
+ "sents_f":0.9112107623,
365
+ "dep_uas":0.8130063132,
366
+ "dep_las":0.7723336499,
367
  "dep_las_per_type":{
368
  "advmod":{
369
+ "p":0.6868965517,
370
+ "r":0.7033898305,
371
+ "f":0.6950453594
372
  },
373
  "root":{
374
+ "p":0.8381818182,
375
+ "r":0.8173758865,
376
+ "f":0.8276481149
377
  },
378
  "nsubj":{
379
+ "p":0.8317460317,
380
+ "r":0.8291139241,
381
+ "f":0.8304278922
382
  },
383
  "case":{
384
+ "p":0.883218842,
385
+ "r":0.8875739645,
386
+ "f":0.8853910477
387
  },
388
  "obl":{
389
+ "p":0.6935483871,
390
+ "r":0.6677018634,
391
+ "f":0.6803797468
392
  },
393
  "cc":{
394
+ "p":0.7797101449,
395
+ "r":0.7819767442,
396
+ "f":0.7808417997
397
  },
398
  "conj":{
399
+ "p":0.6305555556,
400
+ "r":0.6053333333,
401
+ "f":0.6176870748
402
  },
403
  "obj":{
404
+ "p":0.7667269439,
405
+ "r":0.8233009709,
406
+ "f":0.7940074906
407
  },
408
  "aux":{
409
+ "p":0.8797653959,
410
+ "r":0.8746355685,
411
+ "f":0.8771929825
412
  },
413
  "acl:relcl":{
414
+ "p":0.6011235955,
415
  "r":0.5783783784,
416
+ "f":0.5895316804
417
  },
418
  "advmod:lmod":{
419
+ "p":0.6388888889,
420
+ "r":0.6865671642,
421
+ "f":0.6618705036
422
  },
423
  "det":{
424
+ "p":0.9143327842,
425
+ "r":0.9143327842,
426
+ "f":0.9143327842
427
  },
428
  "amod":{
429
+ "p":0.7954173486,
430
+ "r":0.8293515358,
431
+ "f":0.8120300752
432
  },
433
  "nmod:poss":{
434
+ "p":0.6593406593,
435
+ "r":0.5940594059,
436
+ "f":0.625
437
  },
438
  "ccomp":{
439
+ "p":0.5890410959,
440
+ "r":0.6935483871,
441
+ "f":0.637037037
442
  },
443
  "nummod":{
444
+ "p":0.8512396694,
445
  "r":0.8583333333,
446
+ "f":0.8547717842
447
  },
448
  "flat":{
449
+ "p":0.7514450867,
450
+ "r":0.8609271523,
451
+ "f":0.8024691358
452
  },
453
  "compound:prt":{
454
+ "p":0.4285714286,
455
+ "r":0.2195121951,
456
+ "f":0.2903225806
457
  },
458
  "advcl":{
459
+ "p":0.5909090909,
460
+ "r":0.5603448276,
461
+ "f":0.5752212389
462
  },
463
  "mark":{
464
+ "p":0.8742138365,
465
+ "r":0.8562628337,
466
+ "f":0.8651452282
467
  },
468
  "cop":{
469
+ "p":0.7891891892,
470
+ "r":0.8342857143,
471
+ "f":0.8111111111
472
  },
473
  "dep":{
474
+ "p":0.1538461538,
475
  "r":0.2641509434,
476
+ "f":0.1944444444
477
  },
478
  "nmod":{
479
+ "p":0.6551724138,
480
+ "r":0.630859375,
481
+ "f":0.6427860697
482
  },
483
  "iobj":{
484
+ "p":0.7272727273,
485
+ "r":0.3636363636,
486
+ "f":0.4848484848
487
  },
488
  "xcomp":{
489
+ "p":0.5833333333,
490
  "r":0.3559322034,
491
+ "f":0.4421052632
492
+ },
493
+ "appos":{
494
+ "p":0.4375,
495
+ "r":0.4242424242,
496
+ "f":0.4307692308
497
  },
498
  "list":{
499
+ "p":0.375,
500
+ "r":0.1666666667,
501
+ "f":0.2307692308
502
  },
503
  "vocative":{
504
  "p":0.0,
511
  "f":0.8461538462
512
  },
513
  "expl":{
514
+ "p":0.8181818182,
515
+ "r":0.7941176471,
516
+ "f":0.8059701493
 
 
 
 
 
517
  },
518
  "obl:tmod":{
519
+ "p":0.7777777778,
520
+ "r":0.3888888889,
521
+ "f":0.5185185185
522
  },
523
+ "obl:lmod":{
524
  "p":0.0,
525
  "r":0.0,
526
  "f":0.0
527
  },
528
+ "discourse":{
529
  "p":0.0,
530
  "r":0.0,
531
  "f":0.0
532
  }
533
  },
534
+ "lemma_acc":0.9481840194,
535
+ "tag_acc":0.9641646489,
536
+ "ents_p":0.8079331942,
537
+ "ents_r":0.80625,
538
+ "ents_f":0.8070907195,
539
  "ents_per_type":{
540
  "PER":{
541
+ "p":0.88125,
542
  "r":0.8493975904,
543
+ "f":0.8650306748
544
  },
545
  "ORG":{
546
+ "p":0.7294117647,
547
+ "r":0.6888888889,
548
+ "f":0.7085714286
549
  },
550
  "MISC":{
551
+ "p":0.6991869919,
552
+ "r":0.7610619469,
553
+ "f":0.7288135593
554
  },
555
  "LOC":{
556
+ "p":0.8828828829,
557
+ "r":0.8828828829,
558
+ "f":0.8828828829
559
  }
560
  },
561
+ "speed":11453.5916881723
562
  },
563
  "sources":[
564
  {
573
  "license":"CC BY-SA 4.0",
574
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
575
  },
 
 
 
 
 
 
576
  {
577
  "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
578
  "url":"https://spacy.io",
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ead2b660084c7bfc2a612bb768ab44d12531106592aa980e916b605ab811d3c3
3
- size 61299
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20e85c840781c0f123abc619c03c85399ed4a8bb47ae7fa1535c3b6d4d75276b
3
+ size 61351
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ce8c46254cc538ef54b60e106195afc64fad4c7c7235f3343405fc54a77286a
3
- size 7091792
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:444f9832c0b6efee7c93e41b2bd5ef98a098d6f0a71e36a62f7677a042909eda
3
+ size 6496592
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3c0868e96825dd96c1135f853ccfc4ae3fe02a104277ceb94d2ffde4c79b17a2
3
  size 308728
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aae04c7eef9ac270ea79e71070b4d694069e21888c8adc6747f24c97da6c915d
3
  size 308728
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�D{"0":{"":41514},"1":{"":34295},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8601,"obl":3949,"obj":3758,"nmod":3565,"conj":2745,"advmod":2095,"flat":1295,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":628,"advmod:lmod":423,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"obl:lmod":44,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
1
+ ��moves�D{"0":{"":41615},"1":{"":34382},"2":{"case":7526,"nsubj":6005,"det":4341,"amod":3967,"advmod":3662,"mark":3530,"aux":2436,"cc":2264,"punct":2187,"cop":1330,"obl":894,"nummod":834,"nmod:poss":656,"nmod":463,"expl":291,"ccomp":203,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":48,"acl:relcl":43},"3":{"punct":8693,"obl":3951,"obj":3760,"nmod":3569,"conj":2747,"advmod":2087,"flat":1302,"nsubj":1169,"acl:relcl":1132,"advcl":809,"amod":622,"advmod:lmod":423,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":168,"list":159,"nmod:poss":156,"punct||conj":151,"cc":135,"mark":133,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"obl:lmod":44,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4383}}�cfg��neg_key�
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ea62d9f10a446fc6fa69a24f6a681d5e348eba73629e4109096d76fdc9a8303e
3
- size 219901
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c17e994908cc6549918a2f0af54d58ec0988bc312bd7c574607b97218e51b9b
3
+ size 219953
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c43e62abccb37657f2afce3e900c32f1b97d0878127ae1c32fcbeffec90ba5d
3
- size 6960804
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:624296ccca73db9a8aa6cf2abbfb11dd88155ffa8c0bd55a79243c0ef4627a04
3
+ size 6365604
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
vocab/key2row CHANGED
Binary files a/vocab/key2row and b/vocab/key2row differ
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:42fe610567bec6fa69da580a4753d083f1f4429efd32b3c1fa638b6a07a6757e
3
- size 10070327
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e742e4f89864a4c4e0324e2c5f25f320183d5db9dd09d3e8fa9e7260bc26ab56
3
+ size 10080884