juhoinkinen commited on
Commit
0179b9a
·
verified ·
1 Parent(s): d3360ef

Upload folder using huggingface_hub

Browse files
Files changed (45) hide show
  1. .dvc/.gitignore +3 -0
  2. .dvc/config +6 -0
  3. .dvcignore +3 -0
  4. .gitignore +2 -0
  5. LICENSE +121 -0
  6. README.md +17 -0
  7. corpora/.gitignore +2 -0
  8. corpora/kirjaesittelyt2021/.gitignore +9 -0
  9. corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv.dvc +10 -0
  10. corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv.dvc +10 -0
  11. corpora/kirjaesittelyt2021/kvesit-ykl-eng-validate.tsv.dvc +9 -0
  12. corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv.dvc +10 -0
  13. corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv.dvc +10 -0
  14. corpora/kirjaesittelyt2021/kvesit-ykl-fin-validate.tsv.dvc +9 -0
  15. corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv.dvc +10 -0
  16. corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv.dvc +10 -0
  17. corpora/kirjaesittelyt2021/kvesit-ykl-swe-validate.tsv.dvc +9 -0
  18. corpora/ykl-classes.tsv.dvc +9 -0
  19. corpora/ykl-skos.ttl.dvc +10 -0
  20. data/projects/.gitignore +6 -0
  21. data/vocabs/.gitignore +2 -0
  22. dvc.lock +575 -0
  23. dvc.yaml +122 -0
  24. projects.d/1-projects-ykl.toml +89 -0
  25. projects.toml +89 -0
  26. reports/test-en.csv +0 -0
  27. reports/test-en.json +5 -0
  28. reports/test-fi.csv +0 -0
  29. reports/test-fi.json +5 -0
  30. reports/test-omikuji-bonsai-en.csv +0 -0
  31. reports/test-omikuji-bonsai-en.json +5 -0
  32. reports/test-omikuji-bonsai-fi.csv +0 -0
  33. reports/test-omikuji-bonsai-fi.json +5 -0
  34. reports/test-omikuji-bonsai-sv.csv +0 -0
  35. reports/test-omikuji-bonsai-sv.json +5 -0
  36. reports/test-omikuji-parabel-en.csv +0 -0
  37. reports/test-omikuji-parabel-en.json +5 -0
  38. reports/test-omikuji-parabel-fi.csv +0 -0
  39. reports/test-omikuji-parabel-fi.json +5 -0
  40. reports/test-omikuji-parabel-sv.csv +0 -0
  41. reports/test-omikuji-parabel-sv.json +5 -0
  42. reports/test-sv.csv +0 -0
  43. reports/test-sv.json +5 -0
  44. requirements.txt +1 -0
  45. sync-model-data-ocp.sh +32 -0
.dvc/.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ /config.local
2
+ /tmp
3
+ /cache
.dvc/config ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ [cache]
2
+ dir = /data/dvc-cache/FintoAI-data-YKL
3
+ shared = group
4
+ type = symlink
5
+ [core]
6
+ autostage = true
.dvcignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Add patterns of files dvc should ignore, which could improve
2
+ # the performance. Learn more at
3
+ # https://dvc.org/doc/user-guide/dvcignore
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ venv
2
+ venv-installed
LICENSE ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Creative Commons Legal Code
2
+
3
+ CC0 1.0 Universal
4
+
5
+ CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6
+ LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
7
+ ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8
+ INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9
+ REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
10
+ PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
11
+ THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
12
+ HEREUNDER.
13
+
14
+ Statement of Purpose
15
+
16
+ The laws of most jurisdictions throughout the world automatically confer
17
+ exclusive Copyright and Related Rights (defined below) upon the creator
18
+ and subsequent owner(s) (each and all, an "owner") of an original work of
19
+ authorship and/or a database (each, a "Work").
20
+
21
+ Certain owners wish to permanently relinquish those rights to a Work for
22
+ the purpose of contributing to a commons of creative, cultural and
23
+ scientific works ("Commons") that the public can reliably and without fear
24
+ of later claims of infringement build upon, modify, incorporate in other
25
+ works, reuse and redistribute as freely as possible in any form whatsoever
26
+ and for any purposes, including without limitation commercial purposes.
27
+ These owners may contribute to the Commons to promote the ideal of a free
28
+ culture and the further production of creative, cultural and scientific
29
+ works, or to gain reputation or greater distribution for their Work in
30
+ part through the use and efforts of others.
31
+
32
+ For these and/or other purposes and motivations, and without any
33
+ expectation of additional consideration or compensation, the person
34
+ associating CC0 with a Work (the "Affirmer"), to the extent that he or she
35
+ is an owner of Copyright and Related Rights in the Work, voluntarily
36
+ elects to apply CC0 to the Work and publicly distribute the Work under its
37
+ terms, with knowledge of his or her Copyright and Related Rights in the
38
+ Work and the meaning and intended legal effect of CC0 on those rights.
39
+
40
+ 1. Copyright and Related Rights. A Work made available under CC0 may be
41
+ protected by copyright and related or neighboring rights ("Copyright and
42
+ Related Rights"). Copyright and Related Rights include, but are not
43
+ limited to, the following:
44
+
45
+ i. the right to reproduce, adapt, distribute, perform, display,
46
+ communicate, and translate a Work;
47
+ ii. moral rights retained by the original author(s) and/or performer(s);
48
+ iii. publicity and privacy rights pertaining to a person's image or
49
+ likeness depicted in a Work;
50
+ iv. rights protecting against unfair competition in regards to a Work,
51
+ subject to the limitations in paragraph 4(a), below;
52
+ v. rights protecting the extraction, dissemination, use and reuse of data
53
+ in a Work;
54
+ vi. database rights (such as those arising under Directive 96/9/EC of the
55
+ European Parliament and of the Council of 11 March 1996 on the legal
56
+ protection of databases, and under any national implementation
57
+ thereof, including any amended or successor version of such
58
+ directive); and
59
+ vii. other similar, equivalent or corresponding rights throughout the
60
+ world based on applicable law or treaty, and any national
61
+ implementations thereof.
62
+
63
+ 2. Waiver. To the greatest extent permitted by, but not in contravention
64
+ of, applicable law, Affirmer hereby overtly, fully, permanently,
65
+ irrevocably and unconditionally waives, abandons, and surrenders all of
66
+ Affirmer's Copyright and Related Rights and associated claims and causes
67
+ of action, whether now known or unknown (including existing as well as
68
+ future claims and causes of action), in the Work (i) in all territories
69
+ worldwide, (ii) for the maximum duration provided by applicable law or
70
+ treaty (including future time extensions), (iii) in any current or future
71
+ medium and for any number of copies, and (iv) for any purpose whatsoever,
72
+ including without limitation commercial, advertising or promotional
73
+ purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
74
+ member of the public at large and to the detriment of Affirmer's heirs and
75
+ successors, fully intending that such Waiver shall not be subject to
76
+ revocation, rescission, cancellation, termination, or any other legal or
77
+ equitable action to disrupt the quiet enjoyment of the Work by the public
78
+ as contemplated by Affirmer's express Statement of Purpose.
79
+
80
+ 3. Public License Fallback. Should any part of the Waiver for any reason
81
+ be judged legally invalid or ineffective under applicable law, then the
82
+ Waiver shall be preserved to the maximum extent permitted taking into
83
+ account Affirmer's express Statement of Purpose. In addition, to the
84
+ extent the Waiver is so judged Affirmer hereby grants to each affected
85
+ person a royalty-free, non transferable, non sublicensable, non exclusive,
86
+ irrevocable and unconditional license to exercise Affirmer's Copyright and
87
+ Related Rights in the Work (i) in all territories worldwide, (ii) for the
88
+ maximum duration provided by applicable law or treaty (including future
89
+ time extensions), (iii) in any current or future medium and for any number
90
+ of copies, and (iv) for any purpose whatsoever, including without
91
+ limitation commercial, advertising or promotional purposes (the
92
+ "License"). The License shall be deemed effective as of the date CC0 was
93
+ applied by Affirmer to the Work. Should any part of the License for any
94
+ reason be judged legally invalid or ineffective under applicable law, such
95
+ partial invalidity or ineffectiveness shall not invalidate the remainder
96
+ of the License, and in such case Affirmer hereby affirms that he or she
97
+ will not (i) exercise any of his or her remaining Copyright and Related
98
+ Rights in the Work or (ii) assert any associated claims and causes of
99
+ action with respect to the Work, in either case contrary to Affirmer's
100
+ express Statement of Purpose.
101
+
102
+ 4. Limitations and Disclaimers.
103
+
104
+ a. No trademark or patent rights held by Affirmer are waived, abandoned,
105
+ surrendered, licensed or otherwise affected by this document.
106
+ b. Affirmer offers the Work as-is and makes no representations or
107
+ warranties of any kind concerning the Work, express, implied,
108
+ statutory or otherwise, including without limitation warranties of
109
+ title, merchantability, fitness for a particular purpose, non
110
+ infringement, or the absence of latent or other defects, accuracy, or
111
+ the present or absence of errors, whether or not discoverable, all to
112
+ the greatest extent permissible under applicable law.
113
+ c. Affirmer disclaims responsibility for clearing rights of other persons
114
+ that may apply to the Work or any use thereof, including without
115
+ limitation any person's Copyright and Related Rights in the Work.
116
+ Further, Affirmer disclaims responsibility for obtaining any necessary
117
+ consents, permissions or other rights required for any use of the
118
+ Work.
119
+ d. Affirmer understands and acknowledges that Creative Commons is not a
120
+ party to this document and has no duty or obligation with respect to
121
+ this CC0 or use of the Work.
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FintoAI-data-YKL
2
+ Configurations for maintaining the Annif projects with YKL vocabulary used at [Finto AI service](ai.finto.fi/).
3
+
4
+ The projects are trained and evaluated using a [DVC (Data Version Control) pipeline](https://dvc.org/doc/start/data-management/data-pipelines) defined in [dvc.yaml](/dvc.yaml).
5
+
6
+ The pipeline takes care of
7
+
8
+ 1. installing Annif in a venv,
9
+ 2. loading the vocabulary,
10
+ 3. training the projects,
11
+ 4. evaluating the projects.
12
+
13
+ When the necessary vocabulary and training corpora are in place the pipeline can be run using the command
14
+
15
+ dvc repro
16
+
17
+ For more information about using DVC with Annif projects see the [DVC exercise of Annif tutorial](https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/OPT_dvc.md).
corpora/.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ /ykl-classes.tsv
2
+ /ykl-skos.ttl
corpora/kirjaesittelyt2021/.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ /kvesit-ykl-fin-train.tsv
2
+ /kvesit-ykl-fin-validate.tsv
3
+ /kvesit-ykl-fin-test.tsv
4
+ /kvesit-ykl-eng-train.tsv
5
+ /kvesit-ykl-eng-validate.tsv
6
+ /kvesit-ykl-eng-test.tsv
7
+ /kvesit-ykl-swe-train.tsv
8
+ /kvesit-ykl-swe-validate.tsv
9
+ /kvesit-ykl-swe-test.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: d0048a06955ce66d56224834e8f6ef9c
2
+ deps:
3
+ - md5: bc7f924d84a20e9ab91e2600fd2415ec
4
+ size: 210866
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-eng-test.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: bc7f924d84a20e9ab91e2600fd2415ec
9
+ size: 210866
10
+ path: kvesit-ykl-eng-test.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: fbe7fe15b03297c612dd0e436c9f63af
2
+ deps:
3
+ - md5: 87548d6f8f9a8185dc870e8112668c87
4
+ size: 1730632
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-eng-train.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: 87548d6f8f9a8185dc870e8112668c87
9
+ size: 1730632
10
+ path: kvesit-ykl-eng-train.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-eng-validate.tsv.dvc ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ md5: da25cc54ab5e66d5e2ec5c21edc76796
2
+ deps:
3
+ - md5: c1706bb4102fa4799492fd653acc97a2
4
+ size: 218773
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-eng-validate.tsv
6
+ outs:
7
+ - md5: c1706bb4102fa4799492fd653acc97a2
8
+ size: 218773
9
+ path: kvesit-ykl-eng-validate.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: e41640988a93564680ee91b589667028
2
+ deps:
3
+ - md5: 9f8d094ae71ec68c5d40dde46d991e1f
4
+ size: 4534782
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-fin-test.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: 9f8d094ae71ec68c5d40dde46d991e1f
9
+ size: 4534782
10
+ path: kvesit-ykl-fin-test.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: d86b6c73a936179a8f4b069e5ac90322
2
+ deps:
3
+ - md5: cc52c6a5c21d799aad3fb73a889cd90e
4
+ size: 36684502
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-fin-train.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: cc52c6a5c21d799aad3fb73a889cd90e
9
+ size: 36684502
10
+ path: kvesit-ykl-fin-train.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-fin-validate.tsv.dvc ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ md5: c11ba98458f2295383418b1fd8142813
2
+ deps:
3
+ - md5: 88d1c1d133953681474207186f0408e1
4
+ size: 4575927
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-fin-validate.tsv
6
+ outs:
7
+ - md5: 88d1c1d133953681474207186f0408e1
8
+ size: 4575927
9
+ path: kvesit-ykl-fin-validate.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: 72edc7073e64d873fe62c73201c13965
2
+ deps:
3
+ - md5: a9177ace2f0d1dffa6a424fccdf04b37
4
+ size: 287821
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-swe-test.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: a9177ace2f0d1dffa6a424fccdf04b37
9
+ size: 287821
10
+ path: kvesit-ykl-swe-test.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: b1f96f9bf1677ae056540572f3968311
2
+ deps:
3
+ - md5: fa74f039e20594e212d2429eb3e74c13
4
+ size: 2367569
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-swe-train.tsv
6
+ hash: md5
7
+ outs:
8
+ - md5: fa74f039e20594e212d2429eb3e74c13
9
+ size: 2367569
10
+ path: kvesit-ykl-swe-train.tsv
corpora/kirjaesittelyt2021/kvesit-ykl-swe-validate.tsv.dvc ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ md5: 896803dc4e4492ea12625b363ef21007
2
+ deps:
3
+ - md5: 6ae75a57ef7c1107f55978d759796c94
4
+ size: 294875
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/kvesit-ykl-swe-validate.tsv
6
+ outs:
7
+ - md5: 6ae75a57ef7c1107f55978d759796c94
8
+ size: 294875
9
+ path: kvesit-ykl-swe-validate.tsv
corpora/ykl-classes.tsv.dvc ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ md5: c0b12e0a05399ad397f12e30d4bef364
2
+ deps:
3
+ - md5: b2f29070ccedefda72994b8db324e371
4
+ size: 187637
5
+ path: /data/Annif-corpora-restricted/kirjaesittelyt2021/ykl/ykl-classes.tsv
6
+ outs:
7
+ - md5: b2f29070ccedefda72994b8db324e371
8
+ size: 187637
9
+ path: ykl-classes.tsv
corpora/ykl-skos.ttl.dvc ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ md5: 7f7e7284f20b95f4d8717a955778e726
2
+ deps:
3
+ - md5: fea11c863134c9e8379ea52e0f31e28a
4
+ size: 3901929
5
+ path: /data/Annif-corpora/vocab/ykl-skos.ttl
6
+ hash: md5
7
+ outs:
8
+ - md5: fea11c863134c9e8379ea52e0f31e28a
9
+ size: 3901929
10
+ path: ykl-skos.ttl
data/projects/.gitignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ /ykl-omikuji-parabel-fi
2
+ /ykl-omikuji-bonsai-fi
3
+ /ykl-omikuji-bonsai-sv
4
+ /ykl-omikuji-bonsai-en
5
+ /ykl-omikuji-parabel-sv
6
+ /ykl-omikuji-parabel-en
data/vocabs/.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ /ykl-fi
2
+ /ykl
dvc.lock ADDED
@@ -0,0 +1,575 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ schema: '2.0'
2
+ stages:
3
+ loadvoc-fi:
4
+ cmd: annif loadvoc ykl-omikuji-parabel-fi corpora/ykl-classes.tsv
5
+ deps:
6
+ - path: corpora/ykl-classes.tsv
7
+ md5: b2f29070ccedefda72994b8db324e371
8
+ size: 187637
9
+ - path: venv-installed
10
+ md5: 2e6baa8289dac9e06cb525999eb39a70
11
+ size: 42
12
+ outs:
13
+ - path: data/vocabs/ykl-fi
14
+ md5: 680732b8de9b5108536ccea40462734e.dir
15
+ size: 764449
16
+ nfiles: 3
17
+ train-omikuji-parabel-fi:
18
+ cmd: annif train ykl-omikuji-parabel-fi -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
19
+ deps:
20
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
21
+ md5: ffa2c09b76a8a5370dbe71540da8c0ff
22
+ size: 38164190
23
+ - path: data/vocabs/ykl
24
+ md5: 3b9896e0ea6b1a4fa9820f015831fd93.dir
25
+ size: 6503819
26
+ nfiles: 3
27
+ - path: venv-installed
28
+ md5: 2e6baa8289dac9e06cb525999eb39a70
29
+ size: 42
30
+ params:
31
+ projects.toml:
32
+ ykl-omikuji-parabel-fi:
33
+ name: YKL Omikuji Parabel Finnish
34
+ language: fi
35
+ backend: omikuji
36
+ analyzer: voikko(fi)
37
+ vocab: ykl
38
+ ngram: 1
39
+ min_df: 1
40
+ outs:
41
+ - path: data/projects/ykl-omikuji-parabel-fi
42
+ md5: e506a5b142c024d089a449b9ab99da7c.dir
43
+ size: 119198246
44
+ nfiles: 6
45
+ eval-omikuji-parabel-fi:
46
+ cmd: annif eval ykl-omikuji-parabel-fi -j 8 -m Precision@1 -m NDCG --metrics-file
47
+ reports/test-omikuji-parabel-fi.json corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
48
+ deps:
49
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
50
+ md5: e4497d02e1c12d30de314f25ac5c5a8e
51
+ size: 3289456
52
+ - path: data/projects/ykl-omikuji-parabel-fi
53
+ md5: e506a5b142c024d089a449b9ab99da7c.dir
54
+ size: 119198246
55
+ nfiles: 6
56
+ - path: venv-installed
57
+ md5: 2e6baa8289dac9e06cb525999eb39a70
58
+ size: 42
59
+ outs:
60
+ - path: reports/test-omikuji-parabel-fi.json
61
+ md5: e0079ca11936a9627c9ee5b4a23e4c48
62
+ size: 100
63
+ install:
64
+ cmd: python3 -m venv venv && . venv/bin/activate && pip install -U pip wheel setuptools
65
+ && pip install -r requirements.txt && cp requirements.txt venv-installed
66
+ deps:
67
+ - path: requirements.txt
68
+ hash: md5
69
+ md5: f85f21b68735b126c2241fbff83fd0ef
70
+ size: 41
71
+ outs:
72
+ - path: venv-installed
73
+ hash: md5
74
+ md5: f85f21b68735b126c2241fbff83fd0ef
75
+ size: 41
76
+ train-omikuji-bonsai-fi:
77
+ cmd: annif train ykl-omikuji-bonsai-fi -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
78
+ deps:
79
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
80
+ md5: ffa2c09b76a8a5370dbe71540da8c0ff
81
+ size: 38164190
82
+ - path: data/vocabs/ykl
83
+ md5: 3b9896e0ea6b1a4fa9820f015831fd93.dir
84
+ size: 6503819
85
+ nfiles: 3
86
+ - path: venv-installed
87
+ md5: 2e6baa8289dac9e06cb525999eb39a70
88
+ size: 42
89
+ params:
90
+ projects.toml:
91
+ ykl-omikuji-bonsai-fi:
92
+ name: YKL Omikuji Bonsai Finnish
93
+ language: fi
94
+ backend: omikuji
95
+ cluster_balanced: 'False'
96
+ cluster_k: 100
97
+ max_depth: 3
98
+ analyzer: voikko(fi)
99
+ vocab: ykl
100
+ ngram: 1
101
+ min_df: 1
102
+ outs:
103
+ - path: data/projects/ykl-omikuji-bonsai-fi
104
+ md5: 1166661a5b68a9c155a57d40592d11eb.dir
105
+ size: 114643724
106
+ nfiles: 6
107
+ eval-omikuji-bonsai-fi:
108
+ cmd: annif eval ykl-omikuji-bonsai-fi -j 8 -m Precision@1 -m NDCG --metrics-file
109
+ reports/test-omikuji-bonsai-fi.json corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
110
+ deps:
111
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
112
+ md5: e4497d02e1c12d30de314f25ac5c5a8e
113
+ size: 3289456
114
+ - path: data/projects/ykl-omikuji-bonsai-fi
115
+ md5: 1166661a5b68a9c155a57d40592d11eb.dir
116
+ size: 114643724
117
+ nfiles: 6
118
+ - path: venv-installed
119
+ md5: 2e6baa8289dac9e06cb525999eb39a70
120
+ size: 42
121
+ outs:
122
+ - path: reports/test-omikuji-bonsai-fi.json
123
+ md5: da9fafaa27f186fff29cb24b505c2fa2
124
+ size: 100
125
+ load-vocab:
126
+ cmd: annif load-vocab ykl corpora/ykl-skos.ttl
127
+ deps:
128
+ - path: corpora/ykl-skos.ttl
129
+ md5: fea11c863134c9e8379ea52e0f31e28a
130
+ size: 3901929
131
+ - path: venv-installed
132
+ hash: md5
133
+ md5: f85f21b68735b126c2241fbff83fd0ef
134
+ size: 41
135
+ outs:
136
+ - path: data/vocabs/ykl
137
+ hash: md5
138
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
139
+ size: 6503179
140
+ nfiles: 3
141
+ train-omikuji-bonsai@0:
142
+ cmd: annif train ykl-omikuji-bonsai-fi -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
143
+ deps:
144
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
145
+ md5: cc52c6a5c21d799aad3fb73a889cd90e
146
+ size: 36684502
147
+ - path: data/vocabs/ykl
148
+ hash: md5
149
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
150
+ size: 6503179
151
+ nfiles: 3
152
+ - path: venv-installed
153
+ hash: md5
154
+ md5: f85f21b68735b126c2241fbff83fd0ef
155
+ size: 41
156
+ params:
157
+ projects.toml:
158
+ ykl-omikuji-bonsai-fi:
159
+ name: YKL Omikuji Bonsai Finnish
160
+ language: fi
161
+ backend: omikuji
162
+ cluster_balanced: 'False'
163
+ cluster_k: 100
164
+ max_depth: 3
165
+ analyzer: voikko(fi)
166
+ vocab: ykl
167
+ ngram: 1
168
+ min_df: 1
169
+ access: hidden
170
+ outs:
171
+ - path: data/projects/ykl-omikuji-bonsai-fi
172
+ hash: md5
173
+ md5: 9bdaf3071b9fe0d13f72e1b2d6873bfb.dir
174
+ size: 110266624
175
+ nfiles: 6
176
+ eval-omikuji-bonsai@0:
177
+ cmd: annif eval ykl-omikuji-bonsai-fi -j 8 -m Precision@1 -m NDCG --metrics-file
178
+ reports/test-omikuji-bonsai-fi.json --results-file reports/test-omikuji-bonsai-fi.csv
179
+ corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
180
+ deps:
181
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
182
+ md5: 9f8d094ae71ec68c5d40dde46d991e1f
183
+ size: 4534782
184
+ - path: data/projects/ykl-omikuji-bonsai-fi
185
+ hash: md5
186
+ md5: 9bdaf3071b9fe0d13f72e1b2d6873bfb.dir
187
+ size: 110266624
188
+ nfiles: 6
189
+ - path: venv-installed
190
+ hash: md5
191
+ md5: f85f21b68735b126c2241fbff83fd0ef
192
+ size: 41
193
+ outs:
194
+ - path: reports/test-omikuji-bonsai-fi.csv
195
+ hash: md5
196
+ md5: 3c536124365595cea86fa741f9c355cb
197
+ size: 237892
198
+ - path: reports/test-omikuji-bonsai-fi.json
199
+ hash: md5
200
+ md5: 24f9c18652995a673cb765edf417b2c8
201
+ size: 100
202
+ train-omikuji-bonsai@1:
203
+ cmd: annif train ykl-omikuji-bonsai-sv -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv
204
+ deps:
205
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv
206
+ md5: fa74f039e20594e212d2429eb3e74c13
207
+ size: 2367569
208
+ - path: data/vocabs/ykl
209
+ hash: md5
210
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
211
+ size: 6503179
212
+ nfiles: 3
213
+ - path: venv-installed
214
+ hash: md5
215
+ md5: f85f21b68735b126c2241fbff83fd0ef
216
+ size: 41
217
+ params:
218
+ projects.toml:
219
+ ykl-omikuji-bonsai-sv:
220
+ name: YKL Omikuji Bonsai Swedish
221
+ language: sv
222
+ backend: omikuji
223
+ cluster_balanced: 'False'
224
+ cluster_k: 100
225
+ max_depth: 3
226
+ analyzer: snowball(swedish)
227
+ vocab: ykl
228
+ ngram: 1
229
+ min_df: 1
230
+ access: hidden
231
+ outs:
232
+ - path: data/projects/ykl-omikuji-bonsai-sv
233
+ hash: md5
234
+ md5: e589e2a60b5f7cdcc5e67c7443b4c87a.dir
235
+ size: 8507130
236
+ nfiles: 6
237
+ train-omikuji-bonsai@2:
238
+ cmd: annif train ykl-omikuji-bonsai-en -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv
239
+ deps:
240
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv
241
+ md5: 87548d6f8f9a8185dc870e8112668c87
242
+ size: 1730632
243
+ - path: data/vocabs/ykl
244
+ hash: md5
245
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
246
+ size: 6503179
247
+ nfiles: 3
248
+ - path: venv-installed
249
+ hash: md5
250
+ md5: f85f21b68735b126c2241fbff83fd0ef
251
+ size: 41
252
+ params:
253
+ projects.toml:
254
+ ykl-omikuji-bonsai-en:
255
+ name: YKL Omikuji Bonsai English
256
+ language: en
257
+ backend: omikuji
258
+ cluster_balanced: 'False'
259
+ cluster_k: 100
260
+ max_depth: 3
261
+ analyzer: snowball(english)
262
+ vocab: ykl
263
+ ngram: 1
264
+ min_df: 1
265
+ access: hidden
266
+ outs:
267
+ - path: data/projects/ykl-omikuji-bonsai-en
268
+ hash: md5
269
+ md5: 0bce7445a6e1df060f800ec5516959c1.dir
270
+ size: 5332560
271
+ nfiles: 6
272
+ eval-omikuji-bonsai@2:
273
+ cmd: annif eval ykl-omikuji-bonsai-en -j 8 -m Precision@1 -m NDCG --metrics-file
274
+ reports/test-omikuji-bonsai-en.json --results-file reports/test-omikuji-bonsai-en.csv
275
+ corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
276
+ deps:
277
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
278
+ md5: bc7f924d84a20e9ab91e2600fd2415ec
279
+ size: 210866
280
+ - path: data/projects/ykl-omikuji-bonsai-en
281
+ hash: md5
282
+ md5: 0bce7445a6e1df060f800ec5516959c1.dir
283
+ size: 5332560
284
+ nfiles: 6
285
+ - path: venv-installed
286
+ hash: md5
287
+ md5: f85f21b68735b126c2241fbff83fd0ef
288
+ size: 41
289
+ outs:
290
+ - path: reports/test-omikuji-bonsai-en.csv
291
+ hash: md5
292
+ md5: e4734c53c0225ad899a06926e8b7e0d1
293
+ size: 212494
294
+ - path: reports/test-omikuji-bonsai-en.json
295
+ hash: md5
296
+ md5: 6f0029dd304ac0fc94b7817aa9ea3462
297
+ size: 100
298
+ train-omikuji-parabel@0:
299
+ cmd: annif train ykl-omikuji-parabel-fi -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
300
+ deps:
301
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-train.tsv
302
+ md5: cc52c6a5c21d799aad3fb73a889cd90e
303
+ size: 36684502
304
+ - path: data/vocabs/ykl
305
+ hash: md5
306
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
307
+ size: 6503179
308
+ nfiles: 3
309
+ - path: venv-installed
310
+ hash: md5
311
+ md5: f85f21b68735b126c2241fbff83fd0ef
312
+ size: 41
313
+ params:
314
+ projects.toml:
315
+ ykl-omikuji-parabel-fi:
316
+ name: YKL Omikuji Parabel Finnish
317
+ language: fi
318
+ backend: omikuji
319
+ analyzer: voikko(fi)
320
+ vocab: ykl
321
+ ngram: 1
322
+ min_df: 1
323
+ access: hidden
324
+ outs:
325
+ - path: data/projects/ykl-omikuji-parabel-fi
326
+ hash: md5
327
+ md5: 44ab11d0b3e729a0dcd3b88fdadb7567.dir
328
+ size: 114032199
329
+ nfiles: 6
330
+ eval-omikuji-parabel@0:
331
+ cmd: annif eval ykl-omikuji-parabel-fi -j 8 -m Precision@1 -m NDCG --metrics-file
332
+ reports/test-omikuji-parabel-fi.json --results-file reports/test-omikuji-parabel-fi.csv
333
+ corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
334
+ deps:
335
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
336
+ md5: 9f8d094ae71ec68c5d40dde46d991e1f
337
+ size: 4534782
338
+ - path: data/projects/ykl-omikuji-parabel-fi
339
+ hash: md5
340
+ md5: 44ab11d0b3e729a0dcd3b88fdadb7567.dir
341
+ size: 114032199
342
+ nfiles: 6
343
+ - path: venv-installed
344
+ hash: md5
345
+ md5: f85f21b68735b126c2241fbff83fd0ef
346
+ size: 41
347
+ outs:
348
+ - path: reports/test-omikuji-parabel-fi.csv
349
+ hash: md5
350
+ md5: 6ef001dff735a3b3a64082c02721736c
351
+ size: 236720
352
+ - path: reports/test-omikuji-parabel-fi.json
353
+ hash: md5
354
+ md5: 1a7877d85e63ee67dc0c77f8b5c25ba2
355
+ size: 97
356
+ train-omikuji-parabel@1:
357
+ cmd: annif train ykl-omikuji-parabel-sv -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv
358
+ deps:
359
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-swe-train.tsv
360
+ md5: fa74f039e20594e212d2429eb3e74c13
361
+ size: 2367569
362
+ - path: data/vocabs/ykl
363
+ hash: md5
364
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
365
+ size: 6503179
366
+ nfiles: 3
367
+ - path: venv-installed
368
+ hash: md5
369
+ md5: f85f21b68735b126c2241fbff83fd0ef
370
+ size: 41
371
+ params:
372
+ projects.toml:
373
+ ykl-omikuji-parabel-sv:
374
+ name: YKL Omikuji Parabel Swedish
375
+ language: sv
376
+ backend: omikuji
377
+ analyzer: snowball(swedish)
378
+ vocab: ykl
379
+ ngram: 1
380
+ min_df: 1
381
+ access: hidden
382
+ outs:
383
+ - path: data/projects/ykl-omikuji-parabel-sv
384
+ hash: md5
385
+ md5: ce70c5681d39ac52e3b2a9fc2efa7f4a.dir
386
+ size: 8635331
387
+ nfiles: 6
388
+ eval-omikuji-parabel@1:
389
+ cmd: annif eval ykl-omikuji-parabel-sv -j 8 -m Precision@1 -m NDCG --metrics-file
390
+ reports/test-omikuji-parabel-sv.json --results-file reports/test-omikuji-parabel-sv.csv
391
+ corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
392
+ deps:
393
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
394
+ md5: a9177ace2f0d1dffa6a424fccdf04b37
395
+ size: 287821
396
+ - path: data/projects/ykl-omikuji-parabel-sv
397
+ hash: md5
398
+ md5: ce70c5681d39ac52e3b2a9fc2efa7f4a.dir
399
+ size: 8635331
400
+ nfiles: 6
401
+ - path: venv-installed
402
+ hash: md5
403
+ md5: f85f21b68735b126c2241fbff83fd0ef
404
+ size: 41
405
+ outs:
406
+ - path: reports/test-omikuji-parabel-sv.csv
407
+ hash: md5
408
+ md5: a9b65c5400c421b4ce9f1f7e44dffccd
409
+ size: 214164
410
+ - path: reports/test-omikuji-parabel-sv.json
411
+ hash: md5
412
+ md5: 7f68b59b4db8a3a6116e8ec105913678
413
+ size: 87
414
+ train-omikuji-parabel@2:
415
+ cmd: annif train ykl-omikuji-parabel-en -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv
416
+ deps:
417
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-eng-train.tsv
418
+ md5: 87548d6f8f9a8185dc870e8112668c87
419
+ size: 1730632
420
+ - path: data/vocabs/ykl
421
+ hash: md5
422
+ md5: 083422d4d504723b2eb4fc4ee8805a99.dir
423
+ size: 6503179
424
+ nfiles: 3
425
+ - path: venv-installed
426
+ hash: md5
427
+ md5: f85f21b68735b126c2241fbff83fd0ef
428
+ size: 41
429
+ params:
430
+ projects.toml:
431
+ ykl-omikuji-parabel-en:
432
+ name: YKL Omikuji Parabel English
433
+ language: en
434
+ backend: omikuji
435
+ analyzer: snowball(english)
436
+ vocab: ykl
437
+ ngram: 1
438
+ min_df: 1
439
+ access: hidden
440
+ outs:
441
+ - path: data/projects/ykl-omikuji-parabel-en
442
+ hash: md5
443
+ md5: f1d1a08423ae130c839f3db15bc9ddf3.dir
444
+ size: 5326770
445
+ nfiles: 6
446
+ eval-omikuji-parabel@2:
447
+ cmd: annif eval ykl-omikuji-parabel-en -j 8 -m Precision@1 -m NDCG --metrics-file
448
+ reports/test-omikuji-parabel-en.json --results-file reports/test-omikuji-parabel-en.csv
449
+ corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
450
+ deps:
451
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
452
+ md5: bc7f924d84a20e9ab91e2600fd2415ec
453
+ size: 210866
454
+ - path: data/projects/ykl-omikuji-parabel-en
455
+ hash: md5
456
+ md5: f1d1a08423ae130c839f3db15bc9ddf3.dir
457
+ size: 5326770
458
+ nfiles: 6
459
+ - path: venv-installed
460
+ hash: md5
461
+ md5: f85f21b68735b126c2241fbff83fd0ef
462
+ size: 41
463
+ outs:
464
+ - path: reports/test-omikuji-parabel-en.csv
465
+ hash: md5
466
+ md5: c856d122742783f93908ed93bf2357fc
467
+ size: 212459
468
+ - path: reports/test-omikuji-parabel-en.json
469
+ hash: md5
470
+ md5: 1a00cc0b691c5a04e71f5f29896af3a2
471
+ size: 100
472
+ eval-omikuji-bonsai@1:
473
+ cmd: annif eval ykl-omikuji-bonsai-sv -j 8 -m Precision@1 -m NDCG --metrics-file
474
+ reports/test-omikuji-bonsai-sv.json --results-file reports/test-omikuji-bonsai-sv.csv
475
+ corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
476
+ deps:
477
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
478
+ md5: a9177ace2f0d1dffa6a424fccdf04b37
479
+ size: 287821
480
+ - path: data/projects/ykl-omikuji-bonsai-sv
481
+ hash: md5
482
+ md5: e589e2a60b5f7cdcc5e67c7443b4c87a.dir
483
+ size: 8507130
484
+ nfiles: 6
485
+ - path: venv-installed
486
+ hash: md5
487
+ md5: f85f21b68735b126c2241fbff83fd0ef
488
+ size: 41
489
+ outs:
490
+ - path: reports/test-omikuji-bonsai-sv.csv
491
+ hash: md5
492
+ md5: 9a6cd19c12e6d7da7f0021735d81c62a
493
+ size: 214103
494
+ - path: reports/test-omikuji-bonsai-sv.json
495
+ hash: md5
496
+ md5: 95ea663acf56c7fbd27d43f755758610
497
+ size: 87
498
+ eval-ensemble@0:
499
+ cmd: annif eval ykl-fi -j 8 -m Precision@1 -m NDCG --metrics-file reports/test-fi.json
500
+ --results-file reports/test-fi.csv corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
501
+ deps:
502
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-fin-test.tsv
503
+ hash: md5
504
+ md5: 9f8d094ae71ec68c5d40dde46d991e1f
505
+ size: 4534782
506
+ - path: data/projects/ykl-fi
507
+ hash: md5
508
+ md5: d751713988987e9331980363e24189ce.dir
509
+ size: 0
510
+ nfiles: 0
511
+ - path: venv-installed
512
+ hash: md5
513
+ md5: f85f21b68735b126c2241fbff83fd0ef
514
+ size: 41
515
+ outs:
516
+ - path: reports/test-fi.csv
517
+ hash: md5
518
+ md5: 38c9f688aaf641cc166db2dc72da3ef1
519
+ size: 237209
520
+ - path: reports/test-fi.json
521
+ hash: md5
522
+ md5: 47cfa99502a3ab3f5a1852b8708cccf9
523
+ size: 100
524
+ eval-ensemble@1:
525
+ cmd: annif eval ykl-sv -j 8 -m Precision@1 -m NDCG --metrics-file reports/test-sv.json
526
+ --results-file reports/test-sv.csv corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
527
+ deps:
528
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-swe-test.tsv
529
+ hash: md5
530
+ md5: a9177ace2f0d1dffa6a424fccdf04b37
531
+ size: 287821
532
+ - path: data/projects/ykl-sv
533
+ hash: md5
534
+ md5: d751713988987e9331980363e24189ce.dir
535
+ size: 0
536
+ nfiles: 0
537
+ - path: venv-installed
538
+ hash: md5
539
+ md5: f85f21b68735b126c2241fbff83fd0ef
540
+ size: 41
541
+ outs:
542
+ - path: reports/test-sv.csv
543
+ hash: md5
544
+ md5: 2fff6f3bc20fddd861bf040622e28166
545
+ size: 214271
546
+ - path: reports/test-sv.json
547
+ hash: md5
548
+ md5: 4f76e748ac704e7b47d74c2c459cacaf
549
+ size: 87
550
+ eval-ensemble@2:
551
+ cmd: annif eval ykl-en -j 8 -m Precision@1 -m NDCG --metrics-file reports/test-en.json
552
+ --results-file reports/test-en.csv corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
553
+ deps:
554
+ - path: corpora/kirjaesittelyt2021/kvesit-ykl-eng-test.tsv
555
+ hash: md5
556
+ md5: bc7f924d84a20e9ab91e2600fd2415ec
557
+ size: 210866
558
+ - path: data/projects/ykl-en
559
+ hash: md5
560
+ md5: d751713988987e9331980363e24189ce.dir
561
+ size: 0
562
+ nfiles: 0
563
+ - path: venv-installed
564
+ hash: md5
565
+ md5: f85f21b68735b126c2241fbff83fd0ef
566
+ size: 41
567
+ outs:
568
+ - path: reports/test-en.csv
569
+ hash: md5
570
+ md5: eea5b22824d23cb90267eb969a613300
571
+ size: 212565
572
+ - path: reports/test-en.json
573
+ hash: md5
574
+ md5: 8d41a89395974efbf8ae06880f94d3c4
575
+ size: 100
dvc.yaml ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ stages:
2
+ install:
3
+ cmd: python3 -m venv venv && . venv/bin/activate && pip install -U pip wheel setuptools && pip install -r requirements.txt && cp requirements.txt venv-installed
4
+ deps:
5
+ - requirements.txt
6
+ outs:
7
+ - venv-installed:
8
+ cache: false
9
+ load-vocab:
10
+ cmd: annif load-vocab ykl corpora/ykl-skos.ttl
11
+ deps:
12
+ - venv-installed
13
+ - corpora/ykl-skos.ttl
14
+ outs:
15
+ - data/vocabs/ykl
16
+ train-omikuji-parabel:
17
+ foreach:
18
+ - lang2: fi
19
+ lang3: fin
20
+ - lang2: sv
21
+ lang3: swe
22
+ - lang2: en
23
+ lang3: eng
24
+ do:
25
+ cmd: annif train ykl-omikuji-parabel-${item.lang2} -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-train.tsv
26
+ params:
27
+ - projects.toml:
28
+ - ykl-omikuji-parabel-${item.lang2}
29
+ deps:
30
+ - venv-installed
31
+ - corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-train.tsv
32
+ - data/vocabs/ykl
33
+ outs:
34
+ - data/projects/ykl-omikuji-parabel-${item.lang2}
35
+ eval-omikuji-parabel:
36
+ foreach:
37
+ - lang2: fi
38
+ lang3: fin
39
+ - lang2: sv
40
+ lang3: swe
41
+ - lang2: en
42
+ lang3: eng
43
+ do:
44
+ cmd: annif eval ykl-omikuji-parabel-${item.lang2} -j 8 -m Precision@1 -m NDCG
45
+ --metrics-file reports/test-omikuji-parabel-${item.lang2}.json
46
+ --results-file reports/test-omikuji-parabel-${item.lang2}.csv
47
+ corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
48
+ deps:
49
+ - venv-installed
50
+ - corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
51
+ - data/projects/ykl-omikuji-parabel-${item.lang2}
52
+ outs:
53
+ - reports/test-omikuji-parabel-${item.lang2}.csv:
54
+ cache: false
55
+ metrics:
56
+ - reports/test-omikuji-parabel-${item.lang2}.json:
57
+ cache: false
58
+ train-omikuji-bonsai:
59
+ foreach:
60
+ - lang2: fi
61
+ lang3: fin
62
+ - lang2: sv
63
+ lang3: swe
64
+ - lang2: en
65
+ lang3: eng
66
+ do:
67
+ cmd: annif train ykl-omikuji-bonsai-${item.lang2} -j 8 corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-train.tsv
68
+ params:
69
+ - projects.toml:
70
+ - ykl-omikuji-bonsai-${item.lang2}
71
+ deps:
72
+ - venv-installed
73
+ - corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-train.tsv
74
+ - data/vocabs/ykl
75
+ outs:
76
+ - data/projects/ykl-omikuji-bonsai-${item.lang2}
77
+ eval-omikuji-bonsai:
78
+ foreach:
79
+ - lang2: fi
80
+ lang3: fin
81
+ - lang2: sv
82
+ lang3: swe
83
+ - lang2: en
84
+ lang3: eng
85
+ do:
86
+ cmd: annif eval ykl-omikuji-bonsai-${item.lang2} -j 8 -m Precision@1 -m NDCG
87
+ --metrics-file reports/test-omikuji-bonsai-${item.lang2}.json
88
+ --results-file reports/test-omikuji-bonsai-${item.lang2}.csv
89
+ corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
90
+ deps:
91
+ - venv-installed
92
+ - corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
93
+ - data/projects/ykl-omikuji-bonsai-${item.lang2}
94
+ outs:
95
+ - reports/test-omikuji-bonsai-${item.lang2}.csv:
96
+ cache: false
97
+ metrics:
98
+ - reports/test-omikuji-bonsai-${item.lang2}.json:
99
+ cache: false
100
+ eval-ensemble:
101
+ foreach:
102
+ - lang2: fi
103
+ lang3: fin
104
+ - lang2: sv
105
+ lang3: swe
106
+ - lang2: en
107
+ lang3: eng
108
+ do:
109
+ cmd: annif eval ykl-${item.lang2} -j 8 -m Precision@1 -m NDCG
110
+ --metrics-file reports/test-${item.lang2}.json
111
+ --results-file reports/test-${item.lang2}.csv
112
+ corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
113
+ deps:
114
+ - venv-installed
115
+ - corpora/kirjaesittelyt2021/kvesit-ykl-${item.lang3}-test.tsv
116
+ - data/projects/ykl-${item.lang2}
117
+ outs:
118
+ - reports/test-${item.lang2}.csv:
119
+ cache: false
120
+ metrics:
121
+ - reports/test-${item.lang2}.json:
122
+ cache: false
projects.d/1-projects-ykl.toml ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [ykl-omikuji-parabel-fi]
2
+ name="YKL Omikuji Parabel Finnish"
3
+ language="fi"
4
+ backend="omikuji"
5
+ analyzer="voikko(fi)"
6
+ vocab="ykl"
7
+ ngram=1
8
+ min_df=1
9
+ access="hidden"
10
+
11
+ [ykl-omikuji-bonsai-fi]
12
+ name="YKL Omikuji Bonsai Finnish"
13
+ language="fi"
14
+ backend="omikuji"
15
+ cluster_balanced = "False"
16
+ cluster_k = 100
17
+ max_depth = 3
18
+ analyzer="voikko(fi)"
19
+ vocab="ykl"
20
+ ngram=1
21
+ min_df=1
22
+ access="hidden"
23
+
24
+ [ykl-omikuji-parabel-sv]
25
+ name="YKL Omikuji Parabel Swedish"
26
+ language="sv"
27
+ backend="omikuji"
28
+ analyzer="snowball(swedish)"
29
+ vocab="ykl"
30
+ ngram=1
31
+ min_df=1
32
+ access="hidden"
33
+
34
+ [ykl-omikuji-bonsai-sv]
35
+ name="YKL Omikuji Bonsai Swedish"
36
+ language="sv"
37
+ backend="omikuji"
38
+ cluster_balanced = "False"
39
+ cluster_k = 100
40
+ max_depth = 3
41
+ analyzer="snowball(swedish)"
42
+ vocab="ykl"
43
+ ngram=1
44
+ min_df=1
45
+ access="hidden"
46
+
47
+ [ykl-omikuji-parabel-en]
48
+ name="YKL Omikuji Parabel English"
49
+ language="en"
50
+ backend="omikuji"
51
+ analyzer="snowball(english)"
52
+ vocab="ykl"
53
+ ngram=1
54
+ min_df=1
55
+ access="hidden"
56
+
57
+ [ykl-omikuji-bonsai-en]
58
+ name="YKL Omikuji Bonsai English"
59
+ language="en"
60
+ backend="omikuji"
61
+ cluster_balanced = "False"
62
+ cluster_k = 100
63
+ max_depth = 3
64
+ analyzer="snowball(english)"
65
+ vocab="ykl"
66
+ ngram=1
67
+ min_df=1
68
+ access="hidden"
69
+
70
+ [ykl-fi]
71
+ name="YKL suomi"
72
+ language="fi"
73
+ backend="ensemble"
74
+ vocab="ykl"
75
+ sources="ykl-omikuji-parabel-fi,ykl-omikuji-bonsai-fi"
76
+
77
+ [ykl-sv]
78
+ name="KAB svenska"
79
+ language="sv"
80
+ backend="ensemble"
81
+ vocab="ykl"
82
+ sources="ykl-omikuji-parabel-sv,ykl-omikuji-bonsai-sv"
83
+
84
+ [ykl-en]
85
+ name="PLC English"
86
+ language="en"
87
+ backend="ensemble"
88
+ vocab="ykl"
89
+ sources="ykl-omikuji-parabel-en,ykl-omikuji-bonsai-en"
projects.toml ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [ykl-omikuji-parabel-fi]
2
+ name="YKL Omikuji Parabel Finnish"
3
+ language="fi"
4
+ backend="omikuji"
5
+ analyzer="voikko(fi)"
6
+ vocab="ykl"
7
+ ngram=1
8
+ min_df=1
9
+ access="hidden"
10
+
11
+ [ykl-omikuji-bonsai-fi]
12
+ name="YKL Omikuji Bonsai Finnish"
13
+ language="fi"
14
+ backend="omikuji"
15
+ cluster_balanced = "False"
16
+ cluster_k = 100
17
+ max_depth = 3
18
+ analyzer="voikko(fi)"
19
+ vocab="ykl"
20
+ ngram=1
21
+ min_df=1
22
+ access="hidden"
23
+
24
+ [ykl-omikuji-parabel-sv]
25
+ name="YKL Omikuji Parabel Swedish"
26
+ language="sv"
27
+ backend="omikuji"
28
+ analyzer="snowball(swedish)"
29
+ vocab="ykl"
30
+ ngram=1
31
+ min_df=1
32
+ access="hidden"
33
+
34
+ [ykl-omikuji-bonsai-sv]
35
+ name="YKL Omikuji Bonsai Swedish"
36
+ language="sv"
37
+ backend="omikuji"
38
+ cluster_balanced = "False"
39
+ cluster_k = 100
40
+ max_depth = 3
41
+ analyzer="snowball(swedish)"
42
+ vocab="ykl"
43
+ ngram=1
44
+ min_df=1
45
+ access="hidden"
46
+
47
+ [ykl-omikuji-parabel-en]
48
+ name="YKL Omikuji Parabel English"
49
+ language="en"
50
+ backend="omikuji"
51
+ analyzer="snowball(english)"
52
+ vocab="ykl"
53
+ ngram=1
54
+ min_df=1
55
+ access="hidden"
56
+
57
+ [ykl-omikuji-bonsai-en]
58
+ name="YKL Omikuji Bonsai English"
59
+ language="en"
60
+ backend="omikuji"
61
+ cluster_balanced = "False"
62
+ cluster_k = 100
63
+ max_depth = 3
64
+ analyzer="snowball(english)"
65
+ vocab="ykl"
66
+ ngram=1
67
+ min_df=1
68
+ access="hidden"
69
+
70
+ [ykl-fi]
71
+ name="YKL suomi"
72
+ language="fi"
73
+ backend="ensemble"
74
+ vocab="ykl"
75
+ sources="ykl-omikuji-parabel-fi,ykl-omikuji-bonsai-fi"
76
+
77
+ [ykl-sv]
78
+ name="KAB svenska"
79
+ language="sv"
80
+ backend="ensemble"
81
+ vocab="ykl"
82
+ sources="ykl-omikuji-parabel-sv,ykl-omikuji-bonsai-sv"
83
+
84
+ [ykl-en]
85
+ name="PLC English"
86
+ language="en"
87
+ backend="ensemble"
88
+ vocab="ykl"
89
+ sources="ykl-omikuji-parabel-en,ykl-omikuji-bonsai-en"
reports/test-en.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-en.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.49295774647887325,
3
+ "NDCG": 0.6053284406661987,
4
+ "Documents_evaluated": 213
5
+ }
reports/test-fi.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-fi.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.5680110715697905,
3
+ "NDCG": 0.7042226791381836,
4
+ "Documents_evaluated": 5058
5
+ }
reports/test-omikuji-bonsai-en.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-bonsai-en.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.49295774647887325,
3
+ "NDCG": 0.6082379221916199,
4
+ "Documents_evaluated": 213
5
+ }
reports/test-omikuji-bonsai-fi.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-bonsai-fi.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.5662317121391854,
3
+ "NDCG": 0.7037694454193115,
4
+ "Documents_evaluated": 5058
5
+ }
reports/test-omikuji-bonsai-sv.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-bonsai-sv.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.6375,
3
+ "NDCG": 0.7122230529785156,
4
+ "Documents_evaluated": 400
5
+ }
reports/test-omikuji-parabel-en.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-parabel-en.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.48826291079812206,
3
+ "NDCG": 0.5963867902755737,
4
+ "Documents_evaluated": 213
5
+ }
reports/test-omikuji-parabel-fi.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-parabel-fi.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.56069592724397,
3
+ "NDCG": 0.691719114780426,
4
+ "Documents_evaluated": 5058
5
+ }
reports/test-omikuji-parabel-sv.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-omikuji-parabel-sv.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.6475,
3
+ "NDCG": 0.7179128527641296,
4
+ "Documents_evaluated": 400
5
+ }
reports/test-sv.csv ADDED
The diff for this file is too large to render. See raw diff
 
reports/test-sv.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Precision@1": 0.6475,
3
+ "NDCG": 0.7206699848175049,
4
+ "Documents_evaluated": 400
5
+ }
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ annif[fasttext,omikuji,nn,voikko]==1.0.*
sync-model-data-ocp.sh ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Runs rsync to transfer model data from the current directory to an OpenShift volume
4
+ # that is attached to a pod which is running Annif. The instance
5
+ # {api-annif-org,ai-finto-fi, etc.} to transfer to is given as the argument.
6
+ # You need to be logged to the cluster with the oc tool.
7
+
8
+ set -e
9
+
10
+ if [ $# -ne 1 ]
11
+ then
12
+ echo "Not enough arguments; argument 1: destination_instance"
13
+ exit 1
14
+ fi
15
+
16
+ pod=$(oc get pods -l app.kubernetes.io/instance=$1,app.kubernetes.io/name=annif -o name)
17
+
18
+ if [[ $pod = *[[:space:]]* ]]
19
+ then
20
+ echo "Multiple pod exists; using first"
21
+ pod=(${pod//$'\n'/ })
22
+ fi
23
+ echo "Target is "$pod
24
+ pod=${pod#pod/}
25
+ if [ -z "${pod}" ]
26
+ then
27
+ echo "No target pod found"
28
+ exit 1
29
+ fi
30
+
31
+ rsync --rsh='oc rsh' -avrL --exclude="*train*" --exclude="*zip" --inplace projects.d $pod:/annif-projects
32
+ rsync --rsh='oc rsh' -avrL --exclude="*train*" --exclude="*zip" --inplace data/{projects,vocabs} $pod:/annif-projects/data